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A COMPARISON OF OBLIQUE AND 
ORTHOGONAL FACTOR SOLUTIONS 


RICHARD W. COAN 
University of Arizona 


THE CHOICE OF rotational methods is one of 
the major unresolved issues in the field of factor 
analysis. In this country, the principal discrepant 
viewpoints on this matter seem best represented by 
Guilford and Cattell. The position of Cattell (2) is 
essentially that oblique factors better represent the 
fundamental psychological entities with which ve 
are dealing, for these entities as they occur in na- 
ture are correlated. Guilford (3)feels less cer- 
tainty that this alleged fact has been demonstrated 
but grants that our data may eventually force us to 
an oblique-factor model. He prefers meanwhile to 
deal with orthogonal factors on the ground that the 
appearance of obliqueness is commonly a conse- 
quence of inadequate test sampling or of insufficient 
factor extraction. 

A common argument for oblique rotation is that 
it enables us to achieve better simple structure, or 
a better approximation to simple structure, than is 
possible with orthogonal rotation. Butthe issue 
cannot be resolved on the grounds of economy alone, 
for each rotational method affords simplicity on one 
level at the expense of simplicity on another. The 
one economy cannot readily be weighed against the 
other, since simplicity of factor pattern (or struc- 
ture) and correlational independence of factors are 
not commensurate phenomena. If we look beyond 
such technical obstacles as the sampling oftests 
and persons and envision the state of affairs that we 
should have with optimal sampling from both popu- 
lations, we can see that the ultimate solution to the 
controversy depends on establishing a meaningful 
fit for our data. The fundamental question is not 
the economy of the mathematical model, but rather 
its elegance and theoretical productivity. The lat- 
ter will be best provided by the model that best en- 
ables us to reconcile hypotheses arising froma 
multitude of related, but independent, factor iza- 
tions. In this paper, we shall notattempt to re- 
solve the issue of factor reproducibility, but we 
Shall consider a number of subsidiary problems. 

To approach a solution to the problem of rota- 
tional methods, we must consider the relationship 
between the alternative procedures with reference 
to contexts in which we already possess good infor- 
mation regarding factor structure. We can most 





justifiably assume the possession of such informa- 
tion when we are dealing withsimple physical meas- 
urements. Thurstone’s box problems provide a 
useful illustration. In the earlier of these (7), 
Thurstone assigned scores on 20 variables to 20 hy- 
pothetical boxes, each variable being a function of 
one or more of the three fundamental measurements. 
In a later study (8), scores on 26 similar variables 
were obtained for a sample of 30 actual boxes. Each 
study culminated in three correlated factors which 
were clearly identifiable as corres ponding to the 
fundamental dimensions of height, length and width. 
In the hypothetical case, of course, Thurstone was 
led to an oblique solution because he had deliberate- 
ly allowed the fundamental measurements to corre- 
late as they would in an ordinary empirical sample. 

In the box-problem situation, it is clear that or- 
thogonal rotation in three dimensions would not 
yield factors which correspond quite so neatly to 
the three known factors. Thomson (6), however, 
has thrown some light on the relationship between 
oblique and orthogonal factorsinthis realm. Using 
a Simplified form of Thurstone’s earlier box prob- 
lem, with eight boxes and seven variables, he de- 
monstrates that the oblique solutioncanbe convert- 
ed to an equivalent orthogonal solution with four fac- 
tors. His calculations yield three simple-structure 
factors which correspond to Thurstone’s factors 
and one general factor which corresponds to the 
second-order size factor which Thurstone’s proce- 
dures should yield. Thomson’s analysis adds some 
weight to the argument that apparent obliqueness 
may result from incompleteness of factor extrac- 
tion, since his orthogonal solution could have been 
obtained directly by rotation infour dimensions, Nof 
many ‘‘orthogonalists’’ will rotate infour dimen- 
sions, of course, when tests of completeness of ex- 
traction clearly indicate three factors. 

On a similar mathematical basis, Schmid and 
Leiman (5) have more recently proposed converting 
oblique solutions to hierarchical orthogonal solu- 
tions. Their procedure yields as many orthogonal 
factors of a common order as there are factors of 
all orders combined in the oblique solution. Cor- 
responding to the oblique factors of highest order, 
there will be very general factors. Lower orders 
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yield orthogonal factors of less generality. There 
will be a set of simple-structure orthogonal factors 
which resemble the first-order oblique factors in 
pattern. Schmid and Leiman feel that their hier- 
archical model preserves the desirable features of 
oblique solutions but eliminates the obscurity of in- 
formation inherent in the use of correlated axes. 
To follow the procedure of Schmid and Leiman, of 
course, we must make an assumptionwhichis bas- 
ic to the execution of any higher-order ana lysis -- 
namely, that the departures from orthogonality in 
our original oblique solution can be meaningfully 
related to recognizable characteristics of the test- 
ed population. i 

From the work of Thomson (6) and of Schmid and 
Leiman (5), it is evident that a cleanly rotated ob- 
lique solution can be converted to an orthogonal so- 
lution that preserves much of the simplicity of the 
oblique factors. It does not follow, however, that 
independent applications of oblique and orthogonal 
rotation by ordinary procedures will result in fac- 
tors which match so neatly. Muchofthe controver- 
sy over the choice of methods relates to the rela- 
tive ease with which we can identify or interpret 

‘factors derived directly from an ob] ique or ortho- 
gonal solution. It is partly at this issue that the 
present study is aimed. 

Existent evidence affords a meager basis for hy- 
potheses regarding differences between oblique and 
orthogonal solutions in factor content. One fairly 
safe prediction, from Thomson’s analysis and from 
broader mathematical considerations, is that orth- 
ogonal rotation will tend to produce general factors 
at the second-order level inanoblique_ solution. 
(The relationship will, of course, be more complex 
if there is appreciable obliqueness among sec ond- 
order factors. ) 

As a further hypothesis, we might take the ob- 
lique rotater’s contention that his factors are eas- 
ier to interpret than those of anorthogonal solution. 
It would be difficult to establish satisfactory cri- 
teria for testing such an hypothesis, however, even 
if we treated the behavior of interpreters asa focal 
dependent veriable. The related problem of factor 
reproducibility is more readily subject to experi- 
mental test, though the data to be reported here 
bear less directly on this. At the present stage of 
research, it may be more useful to keep such gen- 
eral considerations in mind and simply see what 
develops in an essentially exploratory study. Our 
findings should suggest more refined hypotheses 
that can be subjected to more careful scrutiny in 
subsequent research. 


The Present Problem 





For this study, the writer sought a realm of 
physical data in which satisfactory a priori know- 
ledge of structure could be assumed. Something a 
bit more complex than the box problem seemed de- 
sirable, so that some divergence like that usually 
found between oblique and orthogonal solutions 





might be expected. The chosen sample consisted 
of 100 chicken eggs. These were selected so as to 
cover a substantial size range. As commercially 
graded, they included 24 small, 24medium, 28 
large, and 24 jumbo eggs. 

For each subject, these six measurements were 
obtained: (a) weight, (b) linear length, (c) maximum 
linear breadth, (d) linear distance from the axis of 
maximal breadth to the small (i.e., sharper) end of 
the egg, (e) lengthwise circumference, and(f) max- 
imal breadthwise circumference. 

These measures constituted variables 1 through 
6 respectively. In addition, the ratiofor every pair 
of the original measures was determined, givingus 
the following 15 variables, which were numbered 
from 7 to 21: 

a/b 15. b/f 
a/c 16. c/d 
a/d 17. c/e 
a/e 18. c/f 
a/f 19. d/e 

/c 20. d/f 
b/d 21. e/f 

. b/e 

The scores for each of the 21 variables were 
normalized, and intercorrelations were then deter- 
mined by use of the Pearson product-moment for- 
mula. The advisability of normalizing data which 
are already expressed in terms of physical scales 
of equal units will be questioned by some, but we 
may sidestep this area of controversy by noting that 
departures from normality in the original distribu- 
tions were generally slight. The effects of normal - 
ization on our final results were probably inconse- 
quential. The intercorrelations are shown in Table 
I. 

The correlation matrix was factored by the com- 
plete centroid method. For this purpose, unities 
were inserted in the cells of the principal diagonal. 
It should be noted that theoretically the communal- 
ities of our 21 variables are actually unity. Since 
each original measurement enters into six different 
variables, what would otherwise be specific and er- 
ror variance has been converted into common-fac- 
tor variance. An iterative procedure would yield 
obtained communalities which fall just short of unity 
because of minor inconsistencies introduced by the 
scaling and rounding of scores. Ifaraw-scorecor- 
relation formula had not been used, the minor ef- 
fects of grouping would have been the _ principal 
source of error variance. 

A number of tests for completeness of extraction 
were applied, and these showed unusually clear a- 
greement with respect to the presence of six signif- 
icant factors. We might reasonably have expected 
this, since the 21 variables are all functions of six 
independent measurements. Each communality ob- 
tained with six factors is barely less than1.00. The 
centroid matrix is shown in Table Il. The six cen- 
troid factors were subjected to two independent sets 
of rotations--one yielding anorthogonal solution and 
one yielding an oblique solution. Completely blind 





TABLE I 


CORRELATIONS AMONG VARIABLES 





Variable Number 


Variable 
Number 7 @ ®@ 10 11 12 








96 97 93 97 96 £04 
65 84 62 71 84 
90 77 88 86 78 
53 73 40 59 £73 
81 90 78 82 89 
93 83 91 89 80 
91 97 98 90 
87 94 98 
95 86 
94 








JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


CENTROID FACTOR MATRIX 








Centroid Factor 





Variable 
Number 











rotation was facilitated by the use ofa digital-com- 
puter operation whereby each successively calculat- 
ed factor or reference-vector matrix was automat- 
ically plotted by an oscilloscope on film. Attention 
was directed to the actual variable composition of 
each factor only after a finalsolution had emerged. 


The Oblique Solution 


The oblimax routine for Illiac was used to secure 
an initial rotation. Further rotations from two-di- 
mensional plots led to the final solutionshown in 
Table II. The corresponding transformation ma- 
trix and the matrix of intercorrelations among ref- 
erence vectors are shown in Tables IV and V. The 
reference-vector structure manifests an unmistak- 
able simple structure which couldbe improved only 
by very minute shifts. Some readers may feel con- 
cern about the high correlation between reference 
vectors 1' and 5'. Such extreme obliqueness is or- 
dinarily avoided, since two factors or reference 
vectors derived from most kinds of psychological 
data would not be clearly distinguishable if they 
were permitted to correlate so highly. Butif a fac- 
tor analyst adopts the position thatthe obliqueness 
of his factors has some valid meaning be yond the 
peculiarities of his sampling of tests and subjects, 
then to be consistent, he should permit any degree 
of obliqueness demanded by hisdata. He should, in 
short, insist that each factor restonitsown merits 
in a position solely determined by its hyperplane-- 
so long as it is meaningfully separablefrom all 
other factors. In the present case, factors 1' and 
5' have distinct loading patterns as well as distinct 
(but related) meanings. 

If we accept the correlation between 1' and 5' as 
permissible, then we have what may well be re- 
garded as a ‘‘unique’’ solution. The reader, of 
course, is invited to seek a better solution. It 
should be noted that, in the present solution, hyper- 
plane values are virtually confined to a range from 
-.02 to+.02. But even if we define the hyperplane 
in terms of values between -.10 and +.10, we have 
a set of factors which, by Bargmann’s(1) criterion, 
are all significant at well beyond the . 001 level. 

For interpretive purposes, it is useful to con- 
sider as well the factor-pattern matrix, the factor- 
structure matrix, and the matrix of factor intercor- 
relations, which are shown in Tables VI, VI, and 
Vl. The oblique factors pose few interpretive 
problems. Considering factor structure, we see 
that factors 1' and 5' are size factors. In terms of 
the more distinguishable pattern characteristics, 
factor 1' is seen to be more specifically a volume 
factor, its pattern loadings being confined essen - 
tially to the length and breadth measurements.Fac- 
tor 5' is most simply designated as a weight factor. 
There is a temptation to consider it a density fac- 
tor, since the ratio variables--7,8,9,10,11--have 
higher loadings than does weight itself. The im- 
pression is misleading, however, for part of the 
unit variance assigned to variable 1 is necessarily 
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absorbed into factor 1'. While variances are equat- 
ed within the context of our analysis, variable 1 is 
actually of greater generality (andinasense, of 
greater variance) than any of the geometric meas~ 
urement variables. In support of the weight inter- 
pretation, we can note the high correlation between 
factors 1' and 5' and the fact thatvariable | actual- 
ly has the highest correlation with factor 5' (cf. 
Table VII). The equating of variances actually 
causes some difficulty throughout our inte rpreta- 
tions, for while it may not effect patterns per se, 
it produces distortions inthe order of loadings 
within patterns. 

Factors 2', 3', 4', and 6' allrelate in some way 
to shape. Factors 2' and 6' relate to breadth. From 
the pattern loadings, we can see that factor 2' rep- 
resents, in reflected form, the contribution of the 
breadthwise-circumference measure. Factor 6' 
represents the contribution of the simple maximal- 
breadth measure. In neither case do we get a sim- 
ple breadth factor per se, for itis primarily the 
ratios formed by the twobreadth measures that ac- 
tually cluster about the factor axes. In both sets of 
pattern loadings, variable 18 is most prominent. 
This variable does not actually correlate too high- 
ly with either factor, however, since it is a meas- 
ure of sidewise flattening, rather than of breadth 
as such. It must also be recognizedthat because 
of the high correlation between maximal breadth 
and breadthwise circumference, errors of meas- 
urement and of rounding may tend to overshadow 
true variance in the ratio of the two measures. 

Factors 3' and 4' are concerned with length. 
Factor 4' is more clearly identifiableas a factor 
of relative length, or simple length with volume 
partialed out. Most prominent among both the fac- 
tor-pattern and the factor-structure values is the 
ratio of linear length to lengthwise circumferere. 

The contribution of a partial length variable (var- 
iable 4) is central to factor 3', which might be call- 
ed a length-of-long-end factor. The ratios of the 
partial-length measure to linear length and to 
lengthwise circumference (variables 13 and 19) ap- 
pear more prominent among the factor-pattern and 
factor-structure values than does variable 4 itself. 
This suggests an interpretation in terms of length- 
wise proportions, but the fact is a consequence of 
variables 4’s relevance to factor 1'. We mustbear 
in mind, however, in interpreting all four of the 
non-size factors--i.e., 2', 3', 4', and 6*-that we 
are getting at structural components which are es- 
sentially independent of gross volume. 

To complete the oblique-factor picture, we must 
consider the second-order factors which this solu- 
tion yields. Factor analysis of the factor intercor- 
relations clearly yields two, and no more than two, 
Significant factors. Stable communality estimates 
for the first-order factors were obtained by several 
iterations of the centroid procedure. Rotating from 
the ultimate centroid solution, we arrive at two fac- 
tors which are nearly orthogonal. The centroid 
matrix, rotated reference-vector matrix, and 
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TABLE Ill 


OBLIQUE ROTATED REFERENCE VECTOR MATRIX 





Rotated Factor 





Variable 
Number 











TABLE IV 


TRANSFORMATION MATRIX FOR OBLIQUE 
REFERENCE VECTOR SOLUTION 








Rotated Factor 
Centroid 





Factor 3" 4' 5° 





23 08 17 
-29 -05 15 
-91 34 02 
-14 37 6-53 

07 -25 -82 


-O7 -82 06 





TABLE V 


CORRELATIONS AMONG OBLIQUE 
REFERENCE VECTORS 








Rotated Factor 
Rotated 





Factor a » 4' 5' 





ys -31 10 06 -93 
2' -17 -48 24 
3' -29 -01 


4' -03 
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TABLE VI 


OBLIQUE FACTOR PATTERN MATRIX 





Rotated Factor 





Variable 











TABLE VII 


OBLIQUE FACTOR STRUCTURE MATRIX 








Rotated Factor 





Variable 3' 4' 5' 





-13 01 99 
22 52 77 
-30 -22 87 
58 51 65 
02 21 88 
-30 -20 90 
-26 -20 98 
-03 13.97 
-43 -20 94 
-18 -06 99 
-01 15 97 
63 90 Ol 
“79 -06 25 
62 98 -10 
67 96 00 
-93 -76 09 
“55 -73 -07 
“i “55 00 
99 59 -17 
9 77 -11 


64 83 05 
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transformation matrix are shown in Tables IX and 
X. 

Since factor resolution was so clear at the first- 
order level, the second-order analysis yields fac- 
tors which are quite readily interpretable--perhaps 
more readily interpretable than the first-order 
factors. The two factors correspond to the two fun- 
damental dimensions which we should expect to de- 
termine most of the physical differences among 
eggs. Factor I'is a general size factor, combining 
the contributions of factor 1' (volume) and factor 5' 
(weight). 

Factor II' is a general shape factor which may 
be characterized as a ratio of length to width, how- 
ever these are measured. The positive loadings 
for 3' and 4' represent the positive contributions of 
linear length and a component of this. The positive 
loading for 2' and the negative loading for 6' repre- 
sent the negative contributions of breadthwise cir- 
cumference and maximal linear breadth. 


The Orthogonal Solution 





An initial orthogonal solution was performed by 
the quartimax routine. The final solution shown in 
Table XI was obtained through subsequent rotation 
from graphic plots. (The corresponding transfor- 
mation matrix is shown in Table XII.) Orthogonal 
rotation proved more laborious than oblique rota- 
tion, since it yields no‘solution which canbe 
regarded as unique. In interpreting the present so- 
lution, it is important to consider how it differs 
from the chief alternative solutions. 

In the solution shown in Table XI, two general 
factors are evident--A and F. A would seem to be 
a general size factor, which incorporates the load- 
ings of oblique factors 1' and 5'. It corresponds to 
factor I' of our second-order solution. F, on the 
other hand, is a general shape factor which corre- 
sponds to second-order factor I'. The loadings 
clearly mark it as a factor of length vs. breadth. 

Factor B contrasts the simple volumetric var- 
iables (which have negative loadings) with the ratios 
formed by dividing weight by each of the volumetric 
measures (with positive loadings). Asa positive 
function of density and a negative function of volume, 
B might best be characterized as a ‘‘compactness’’ 
factor. Factor C is apparently a reflected equival- 
ent of oblique factor 3' and may be similarly inter- 
preted. Factor D, on the other hand, is equivalent 
to oblique factor 6'. In the case of both C and D,the 
equivalence is one of pattern. Reference to Table 
VII indicates that the orthogonal factors are by no 
means collinear with their oblique counterparts. 

Factor E is one of the mostdifficult to interpret. 
It is a negative function of the lengthwise-circum - 
ference variable and also, to an extent, of the 
weight variable. But the total loading pattern does 
not support interpretations in terms ofsize, weight, 
or length as such. It is probablybest considered 
essentially a product of variance which is specific 
to the lengthwise-circumference variable. 
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There are several alternative solutions whichap- 
pear about equally satisfactory. Each involves a 
redistribution of the variance of one of the general 
factors with respect toa smaller-variance factor. 
Actually, there are only two planes in whichwe can 
make sizable shifts without detracting from the clar- 
ity of the factor structure. We can rotate factor B 
with factor A, and we can rotate factor E with factor 
F. In either case, the general factor will remain 
essentially unchanged in content, since itis only the 
low-loading variables that move into and out of the 
hyperplanes. Our interpretations of factors B and 
E, however, will be altered. 

It is possible to rotate factor B either clockwise 
or counterclockwise with A. If we shiftinone di- 
rection we augment the positive loadings for B and 
pull the negatively loading variables into the hyper- 
plane. This gives us a factor of somewhat simpler 
meaning, which we may identify as ‘‘density.’’ If 
we shift in the other direction, the positive loadings 
drop to insignificance and we are left with more 
substantial negative loadings for the volumetric var- 
iables. The factor then has a less distinct meaning. 
It would appear to be a negative function of that por- 
tion of volume which does not contribute tooverall 
mass, or of volume with weight variance partialed 
out. 

For factor E, there is one alter native position 
attainable through rotation with F. The shift will 
give us positive loadings for the following variables 
(in order of descending magnitude of loading): 14, 
19, 15, 12, 20, 2, and 4. Variables 16 and5 will 
have negative loadings. The meaning of the factor 
becomes no less obscure. The essential effect of 
the rotation seems to be a shift inemphasis from 
variance specific to the lengthwise circumference 
measure to variance specific to the linear length 
measure. 


Discussion 


The purpose of this study was to secure oblique 
and orthogonal solutions independently for a com- 
mon set of original data, sothat wemight make 
meaningful comparisons and relative judgments that 
would be applicable to the two types of rotation as 
they are ordinarily performed. Insofar as rotation 
itself is concerned, we have performed the neces- 
Sary operations. The skeptical reader is urged to 
repeat these operations to satisfy any doubts about 
the adequacy of the solutions. Criticism is more 
likely to be applied, however, tothe original choice 
of variables. A common criticism of oblique rota- 
tion is that apparent obliqueness is a consequence of 


an inadequate sampling of the population of relevant 
variables. 


This criticism can rarely be met ina completely 
definitive way, since it is not possible to circum- 
scribe the total population of relevant variables un- 
less we define the object of analysis itself in terms 
of a particular selection of variables. Int he pre- 
sent case, this would meanthat our consideration of 





TABLE Xi 


TRANSFORMATION MATRIX FOR 
ORTHOGONAL SOLUTION 





Rotated Factor 





Centroid 
Factor B Cc D E 





= a -03 
00 8=— 30 -04 
07 90 -09 

“53 921 18 

“84 -05 -23 


11 -04 -95 
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TABLE XI 


ORTHOGONAL ROTATED FACTOR MATRIX 





Rotated Factor 





Variable C E F 





02 -06 
06 00 
03 00 
-41 -03 
04 -18 
05 00 
00 -04 
02 -10 
23 -02 
01 04 
01 -10 
01 04 
95 04 
07 39 
01 05 
50 01 
02 22 


01 -01 








dently from the present centroid matrix should ob- 
tain nearly identical solutions. We do not have the 
data necessary to indicate how reproducibility will 
be affected when we use different samples of vari- 
ables and persons (or objects) in a second factor 
analysis. 

As it happens, egg measurements were earlier 
analyzed by Muhsam (4), but the semi -artificiality 
of this study and of Muhsam’s limits the meaning- 
fulness of any comparison. Muhsam’s finalsolution 
contained factors which we can identify as simple 
length, simple breadth, and volume withsimple lin- 
ear length and breadth partialed out. These most 
nearly match factors in our oblique solution, but 
this is to be expected since Muhsam’s solution was 
also oblique. There is a great need for studies 
more deliberately planned to yield information re- 
garding the reproducibility of oblique and orthogon- 
alfactors. Insofar as anunambiguous solution 
within each individual factorization is a prerequisite 
for reproducibility, the present evidence argues for 
the greater reproducibility of oblique factors. 

We may also compare the solutions with respect 
to the content of the obtained factors. The relation- 
ship which we anticipated initially is borne out well. 
The oblique solution culminates intwosecond-order 
factors. In meaning, they are obviously equivalent 
to the two general factors found in the orthogonal 
solution. It may be concluded that we can ordinar- 
ily expect general orthogonal factors to be parallel- 
ed by second-order oblique factors, provided ‘that 
one permits sufficient obliqueness for well-defined 
second-order factors to emerge. Tothis_ extent, 
oblique and orthogonal solutions tend to be inter- 
changeable. 

Perhaps no one will argue thateither solution 
gives us the ‘‘real’’ underlying factors whereas the 
other does not, for in either case we obtain a set of 
factors with which we can account for our original 
data. The usefulness of a factor, however, depends 
somewhat on its clarity of meaning, or on the ease 
with which we can interpret it. There is no sharp 
difference between the two solutions in this respect, 
though it appears to the writer that the oblique fac- 
tors on the whole are easier to interpret and simp- 
ler in content. 

It might be expected that an orthogonal solution 
would afford less interpretive confusion of factors 
This assumption appears to be erroneous. Linear 
independence as such does not make for greater in- 
terpretive distinctness. Inthe oblique _ solution, 
factors 1' and 5' are most highly correlated. As 
one would anticipate, they are closely related in 
meaning. Yet the meanings are clearly distinguish- 
able, and the relationship between them makes good 
sense. Some pairs of orthogonal factors (e.g. ,C 
and E)are no more distinct in meaning, even though 
they manifest linear independence. 

One further difference between the two solutions 
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may be noted, but its implications are difficult to 
evaluate. Both solutions tend to yieldfactors which 
are essentially composed of the specific variance 
which we have. transformedinto common-factor 

variance, but the oblique solution yields these 
‘‘transformed specific factors’’ inmore nearly pure 
form. Thus, the significantly loading variables for 
oblique factors 2', 3', 4', 5', and 6'areessentialiy 
functions respectively of the original measurements 
listed above as f,d,b,a,andc. We cannot properly 
make any relative judgment regarding the adequacy 
of the solutions on the basis of this finding. If we 
had originally chosen six measurements which were 
not intercorrelated and incorporated their various 
ratios into the score matrix, we would inevitably 
have arrived at six such ‘‘transformed spec ific’’ 
factors. Any rotational method would have led us 
to the six uncorrelated factors which we had thus 
created. In the present problem, we have confound- 
ed what were originally common-factor variance 
and specific variance, and we can make no a priori 
judgment as tc how these should ideally be parcell- 
ed out among a Set of rotated factors. 


One further matter that merits our attention is 
the occasional argument that oblique rotation yields 
inter-factor correlations that do not meaningfully 
reflect functional relationships within the individ- 
uals or objects studied. Thus, Thurstone’s box 
problem yields oblique factors of height, length, 
and breadth. Since height, length, and breadth are 
orthogonai components of any individual box, it 
is the nature of the population sampled that makes 
the corresponding factors oblique. The underlying 
problem here is actually one that pervades all fac- 
tor solutions. For in R-technique, the correlations 
among variables and among factors are necessarily 
a direct function of covariation running through the 
population and only secondarily a function of func- 
tional relationships within individuals. Perhaps it 
would be useful to distinguish between object factors 
and population (or sampling) factors. We can think 
of the box problem as containing three object factors 
and one population factor. These coexist in three- 
dimensional space, since the population factor 
(‘‘general size concomitance’’) is mathematically 
expressable in terms of the three basic measures 
that underlie the three object factors. 


If we think of Thurstone’s second-order factor 
as a population factor that imparts covariation to 
height, length, and breadth, it is understandable 
that only an oblique three-factor solution will yield 
the patterns of the three object factors (which are 
linearly independent within objects) in pure form. * 
To produce a comparable result orthogonally, we 
should have to add a null centroid factor and rotate 
in four dimensions, even though centroid extraction 
clearly yields only three factors. If we settle for a 
three-factor orthogonal solution, we must be con- 


* Because R-technique factors are explanatory constructs adduced to account for inter-individual differences, 


we cannot always expect factors to correspond this simply, in nature or number, to the physical dimensions 
of the individual object. 
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the structure of eggs would be limited from the out- 
set to those individual differences whichare a func- 
tion of our six fundamental measurements. Even 
without such an arbitrary delimitation of the prob- 
lem, however, we canargue that all gross varia- 
tions in any ordinary sample of eggs canbe express-~ 
ed in terms of various functions of our six original 
variables. Sucha claim could not ordinarily be 
made with respect to any realm of psychological 
data. 

The use of experimentally interdependent varia- 
ables provides further grounds for criticism. One 
might question whether it is legitimate to apply fac- 
tor analysis at all toa correlation matrix primar- 
ily constituted by the various ratios of only six in- 
dependent measures. Strictly speaking, of course, 
the use of experimentally interdependent scores 
does not violate any assumptions underlying factor 
extraction per se. It rather affects the composition 
of the factors extracted and is thus relevant to the 
legitimacy of interpretation. 

The immediate effect of including in a score ma-~ 
trix the ratios of variables already in the matrix is 
the conversion of the unique variance of the simple 
variables into common-factor variance. With most 
psychological data, extensive use of such ratios 
will tend to produce nonsense factors, CO mposed 
to a large extent what would otherwise constitute 
error variance. Inthe present problem, error of 
measurement is minimal, and it is primarily the 
specific variance that is transformed. This does 
not constitute such a serious problem, for the logi- 
cal distinction between specific and common-factor 
variance is anarbitrary one. Inapplication, the 
dividing line is a function of the particular choice of 
variables. We could have accomplished precisely 
the same transformation of specific to common- 
factor variance by introducing additional measure- 
ments which were experimentally independent of the 
ones originally obtained. 

In interpreting factors, we must, nonetheless, 
keep in mind the fact that the use of ratios makes of 
the present problem one which is partially artificial. 
In extending all test vectors to unit length, we treat 
all variables, in effect, as being of equal impor- 
tance in determining the structure of our subjects. 
As we have indicated above, this probably produces 
less distortion in factor pattern as such than in the 
order of magnitude of loadings. Regardless of one’s 
judgment on these matters of variable selection, it 
would be desirable to re-examine inthe light of 
other data any generalizations which we may draw 
from these data regarding rotational methods. 

An objection that has sometimes been specific- 
ally leveled at the use of ratio scores, that they in- 
_ troduce negative correlations and hence create bi- 
polar factors, seems to the writer a little less ger- 
mane. The common insistence on positive manifold 
in the realms of ability and physical measurement 
rests on a gratuitous assumption regarding the fac- 
tors that are to be expected. Conceptually, every 
factor must be bipolar as long as we employ a geo- 





metric model composed of bipolar coordinate axes. 
Even when negative loadings are absent from a so- 
lution, the negative pole of every factor has an im- 
plied meaning opposite to that of the positive pole. 
In some realms of measurement, to be sure, we 
quite readily achieve positive manifold. This does 
not happen because the ‘‘real underlying factors’’ 
are unipolar, however, but because our intellectual 
habits lead us to score variables inadirection con- 
ducive to this. In the physical realm, the mostcon- 
venient operations of measurement are usually (but 
not always) consistent with these intellectual habits. 
In the psychological realm, the direction is gener- 
ally a more arbitrary thing, and often our intellec- 
tual habits dictate inconvenient operations. Thus, 
an individual time score is actually a measure of 
slowness. To secure an index of speed, we must 
perform a transformation. The resultant positive 
manifold in a speed-loaded ability factor is thus an 
artifact. There is really nothing in the nature of 
objective reality that compels us to measure speed 
rather than slowness, intelligence rather than stu- 
pidity, or even bigness rather thansmallness. Fur- 
thermore, both extremes are necessarily implicit 
in whatever factor we derive, even though we take 
pains to make only pole of the factor explicit. 

Much of the debate regarding rotational methods 
centers about the economy of either type of model. 
It is decidedly easier to attainsimple structure 
through oblique rotation. In this problem, oblique 
rotation yields a clear simple structure. The or- 
thogonal solution fails to do so in that each of the 
low-variance factors tends to share its high load- 
ings with one of the two general factors. This is 
not an unusual situation. It can probably be said 
that for most of the kinds of problems to which fac- 
tor analysis has been applied, orthogonality and 
complete simple structure are incompatible goals. 

We cannot make a clear choice of methods, how- 
ever, on the basis of economy alone, for the oblique 
solution by its very nature introduces greater com- 
plexity with respect to the interrelationships among 
factors. The oblique solution best furnishes what 
might be called factor simplicity (i.e. , simple 
structure), while the orthogonal solution provides 
what we might designate modelsimplicity. Both 
factor simplicity and model simplicity are undoubt- 
edly desirable, but since they are incommensurable 
quantities, there is no way of clearly determining 
whether the oblique or the orthogonal solution pro- 
vides greater economy. 

With respect to the clarity of coordinate-axis po- 
sitions, or the uniqueness of the solution, the ob- 
lique solution is definitely superior. It furnishes 
one best set of factors which align themselves neat- 
ly with trends evident in the test configuration. The 
location of orthogonal axes is more arbitrary. It 
can be argued, on this basis, that orthogonal fac- 
tors will manifest less reproducibility. This is 
most clearly the case when independent sets of ro- 
tations are made from the same original centroid 
factors. Two oblique rotaters working indepen- 
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tent to leave the test vectors ‘‘dangling’’ in the quad- 
rants. If we then assume that our three orthogonal 
factors correspond to the ‘‘real’’ independent di- 
mensions of height, length, and breadth, we are 
forced to argue that each of the actual measure- 
ments of height, length, and breadth is contaminat- 
ed with the factors corresponding to the other two. 
This is hardly a sensible argumentif we think only 
in terms of object factors, because we know that 
the surface intersections along which we have 
measured form 90-degree angles with one another. 
It does not follow, of course, that oblique factors 
will always give us better approximations to object 
factors. Nor can we, on the other hand, simply 
dismiss the problem by assuming that the complica- 
tions that we encounter in populations of boxes and 
eggs will not be present in populations of people. In 
any event, we must exercise cautionin interpreting 
inter-factor correlations, and we must not expect 
these correlations to conform rigidly to predictions 
based on our knowledge of functional relationships. 


Summary 


A comparison of oblique and orthogonal factor 
solutions was sought through use of two independent 
sets of rotations applied to a common Set of factors 
derived from a physical problem. The subjects 
consisted of 100 chicken eggs. Six direct measure- 
ments and 15 ratios formed by these constituted the 
variables to be intercorrelated. Six centroid fac- 
tors were obtained. 

The oblique solution yielded two second-order 
factors which corresponded to two general factors 





in the orthogonal solution. The oblique solution, 
though utilizing a more complex mathematical 
model, afforded better simple structure. The ob- 
lique solution was found to be superior in providing 
an unambiguous, or ‘‘unique’’, position for the co- 
ordinate axes. It is suggested that the oblique so- 
lution also yields factors of greater interpretive 
clarity. 
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A SCIENTIFIC METHODOLOGY 


PALMER O. JOHNSON 
University of Minnesota 


IN THE FIELD of scientific research today it 
is the statisticians who design the experimental 
programs and observational surveys. Likewise, 
it is they who analyze the results, assess the evi- 
dence and differentiate between that which has 
been clearly established and that which still needs 
verification. 


Experiment and Survey 





Two principal means of bringing scientific 
knowledge into being are the experiment and the 
survey. The real distinction between the survey 
and experiment for determining ‘‘cause-and-ef- 
fect’’ relationships is that in the experiment the 
research worker exercises control over the 
forces that are put in motion while in the survey 
he is investigating forces over which he has had 
no control. In the experiment, the population(s) 


under study are constructed in a particular way. 


In a survey dealing with the same problem, the 
population under consideration has originated 
from a set of factors whose relation to the forces 
under examination is unknown. It is the exercise 
of this ‘‘control’’ that differentiates experiment- 
ing from surveying. 

Experimentation gives use to the most desir- 
able fruits of the scientific method. In experi- 
mentation we are able to determine the conse- 
quences of altering factors, e.g., we can find out 
what effect changing a factor A has on another 
factor B such that knowledge is acquired basic to 
taking action. Surveys can only detect the exist- 
ence of associations between factors in the popu- 
lation. 

When the aim of the investigation is descrip- 
tive no such general advantages of the experiment 
prevail over the survey. Since in practice the re- 
search worker is not always free to elect the meth- 
od which might seem to be superior, he should be 
ready to employ either method. In some cases it 
may be advantageous to use a combination of both 
methods. Thus the researcher might use the sur- 
vey to explore and the experiment to study the sit- 
uation in greater detail. At times the survey may 
serve to identify factors that are worthy of exper- 
imentation. Also surveys are especially useful in 
situations in which it is very difficult or perhaps 
im possible to conduct an experiment. This has 
been particularly true in the field of human genet- 
ics. Thus it is that methods in science vary with 


the nature of the problem to be investigated. 


Development of the Sample Survey 








A well-known classical example of a survey is 
a census of population, of which there are some 
intersting historical antecedents. The Book of 
Numbers in the Old Testament is a simple ex- 
ample of a survey, a written record resulting 
from an enumeration or counting of the wealth of 
the tribe in terms of persons and animals. An 
early example of aneconomic survey, perhaps the 
first in England, was the Domesday Book. This 
Survey was carried out by William the Norman 
who verified the principle that if a country can be 
efficiently governedit is essential to discern what 
comprises the wealth of that country. 

The U.S. Census required by the Constitution 
has been conducted every ten years since 1790. 
The original purpose of the Census was to ascer- 
tain the number of inhabitants in the United States 
and their residence to furnish the basis by which 
the number of representatives each state would be 
granted in Congress. More recently the Census 
has been made use of for additional purposes 
among which is an important source of informa- 
tion for, administration and research in govern- 
ment, business, labor, and other fields. 

In a free and progressive society, business, 
the government, the professions, and increasing- 
ly, labor, all are continuously in search of the 
widest and soundest possible factual basis for mak- 
ing decisions, formulating policies, and for devel- 
oping scientific, social, and economic theory. It 
is this joint quest particularly for economic and 
social facts that accounts in a large part for the 
very great significance of and emphasis on statis- 
tics. Hundreds of millions of dollars are expend- 
ed annually by both public and private agencies. 

The traditional method for collecting social and 
economic statistics has been that of complete cov- 
erage and enumeration, i.e., the procedures of 
the Census. Theoretically, at least for those pop- 
ulation characteristics that remain relatively con- 
stant, this practice seems to be the best. How- 
ever, such an undertaking is costly, difficult to 
plan and conduct, restricted to a relatively few 
items of information, is time-consuming, andis 
liable to be out of date by the time the findings 
are published. For example, even with modern 
sorting and collecting equipment, it will take sev- 
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eral years before the results of the next Census 
are processed and published, so that if interest- 
ed persons have to wait for the published results 
there would be time for the structure of the popu- 
lation to change. This is acase where the collec- 
tion of too much material may be at times as ob- 
structing and produce consequences as mis|l ead- 
ing as the gathering of too little. It is of interest 
in this connection to note that in the 1940 Census, 
the Bureau of the Census introduced a sampling 
procedure by including a set of supplementary 
questions which were answered by a sample of 
one person intwenty. In fact, the statisticians 
of the Bureau of the Census have played arole of 
leadership in the rapid development of the theory 
and practice of sampling in recent years. 

It is the development of sampling that has 
made it possible to meet the needs of practical 
businessmen, public officials, social scientists, 
and others who depend upon the results of sam- 
pling for a great deal of their factual information. 
We should also point out that the exigencies of 
World War II required the collection of many 
types of data which was only possible by the use 
of sample surveys. 


The Sample Survey as a Scientific Method 





The introduction of statistical sampling, par- 
ticularly in the field where the traditional method 
of the exhaustive census or the attempted com- 
plete count prevailed, has stimulated much inves- 
tigation. This form of inquiry has been designed 
to make sampling meet the new standards of reli- 
ability and efficie. | equired to produce useful 
results. It was the union of enumerative sta- 
tistics with probability that has given rise to mod- 
ern statistical methods. The introduction or use 
of probability sampling (to be shortly discussed) 
made it possible to obtain a quantitative measure 
of sampling error. The knowledge of the magni- 
tude of the sampling error led to the ways of con- 
trolling its extent. This knowledge made possible 
the development of sampling designs and the spec- 
ification of the number of observations (sampling 
units) necessary for a required accuracy, that is, 
a method to provide the desired degree of accu- 
racy at minimum cost. 

We see here the development of sampling 
theory and practice that served to introduce rig- 
orous scientific thinking into sampling surveys. 
Since the fundamental purpose of sampling is to 
secure information about populations, the first 
matter of importance ina sampling survey is 
to begin with a clear specification of what that 
population is. It follows that it is equally impor- 
tant tohaveaclear understanding of what is to be 
found out about the population. This compels the 





* All footnotes will be found at end of article. 





the research worker to make a critical appraisal 
of the purposes the statistics collected are to serve. 
A specification of the population and the dec ision 
as to the precise purpose of the investigation usu- 
ally determine certain parameters of the population 
to be estimated or certain hypotheses to be tested. 
The problem of inductive inferences is defined by 
the object of the sampling survey to secure accu- 
rate answers to certain clearly defined questions. 

The brief sketch so far made of the sample sur 
vey has indicated the derivations of amethod which 
is the most tractable, speedy, economical, and, in 
reality, scientific method so far available for the 
purpose in mind. Many examples support the first 
three claims. The claim of its unique scientific 
character is well stated by Professor R. A. 
Fisher!*: 


why do I say that it (the sample sur- 
vey) is more scientific than the only proced- 
ure with which it may sometimes be in com- 
petition, the complete enumeration? The 
answer, in my view lies in the primary pro- 
cess of designing and planning an inquiry by 
sampling. Rooted as it is in the mathemati- 
cal theory of the errors of random sampling, 
the idea of precision is from the first in the 
forefront. The director of the survey plans 
from the firstfor a predetermined and known 
level of precision; it is a consideration of 
which he never loses sight, and the preci- 
sion actually attained, subject to well under- 
stood precautions, is manifest from the re- 
sults of the inquiry. 


Diverse Fields of Application 





Such a sharp and accurate tool as probability 
sampling did not meet with uniform acceptance and 
use in all fields of research. Even today, there 
is an increasing need of application of modern 
sampling survey methods to practical situation. 

The sample survey has already provedto be ec- 
onomical, of high accuracy, and especially adapt- 
able in comparison with older methods, in fact 
finding in economics, vital statistics, and in the 
programs of the U. S. Bureau of Census. 

In the field of productive industry, the Quality 
Control Engineers have developed special uses 
known as sequential sampling by which they have 
dem onstrated how the efficiency of mass-produc- 
tion may be reconciled with increasing demands 
for precision and reliability of product. The as- 
sessing of consumer preferences has made consid- 
erable progress through the use of sampling and 
it has become possible to specify these actual re- 
quirements on to design and production eng ineer- 


ing. 
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Some beginnings have been made on the nation- 
al scale using probability sampling methods in the 
collection of educational statistics and occasion- 
ally by individual investigators of educational 
problems. However, most workers in this field 
as well as in psychology, sociology, and in other 
social sciences seem to be unfamiliar with or, if 
familiar, not using modern sampling methods. It 
is the status of the sampling problem in social 
science fields that motivated, atleast to some ex- 
tent, the writing of this paper. 


General Methodology in Sampling Surveys 





The planning of sampling designs is usually 
involved in two situations: experimental investi- 
gations and descriptive or analytical surveys. It 
is only the latter situation that concerns us in 
this study. It may be pointed out here, however, 
that the sweeping theoretical and technical ad- 
vances leading to the principles of modern exper- 
imental designs produced a startling advance 
as well in the development of samp] ing designs 
and techniques. 

Turning our attention now more specifically 
to the sampling survey, we find that the statis- 
tician must concern himself with determining the 
number of observations to be drawn from the pop- 
ulation and what method of sampling should be 
used. There is also the practical problem of se- 
lecting the method which will provide the desired 
degree of precision at minimum cost. The cen- 
tral problem of estimating an unknown parameter 
of the specified population is one of finding a func- 
tion of the observations that is the best estimate 
of the parameter. These statistical aspects as 
well as others will be encountered in a number of 
places in the sequel. 

Probability and Judgment Samples— It should 
be emphasized that we are dealing with probabil- 
ity and not judgment samples. In probability 
sampling, there are these distinguishing fea- 
tures: 





1. Every individual (primary sampling unit) 
in the sampled populations has a known probabil- 
ity of being included. 

2. The sample is drawn by a process that ne- 
cessitates one or more acts of automatic random- 
ization conformable with the probabilities innum- 
ber one above. 

3. Weights appropriate to the probabilities in 
number one are applied in the analysis of the 
sample results. 


Probability samples may be self-weighting. 
Such is the case where samples are drawn such 
that each individual (sampling unit) in the popula- 
tion has an equal chance of being included in the 
sample. In such a sampleif it is desired to esti- 
the arithmetical mean of the population of some 





characteristic, the proper procedure is to calcu- 
late the unweighted mean ofall the members of the 
sample. Since here the weights are equal the 
sample is said to be self-weighting. This situa- 
tion satisfies the criteria of a probability sample 
since the relative chances of different individuals 
being included in the sample are known and taken 
into account in the weighting (being inthis case 
equal). However, this is not the only type of a 
probability sample. Instead of giving each mem- 
ber of the population an equal chance of being in- 
cluded in the sample and then being weighted equal- 
ly, a process of compensation may be used where 
those individuals more liable to be included in the 
sample are given less weighting while those less 
likely to enter the sample are given more weight- 
ing when they do occur. In this kind of probabili- 
ty sample, called general probability sampling, 
each individual item is given an equal chance of in- 
fluencing the (weighted) sample mean. 

In judgment or non-probability sampling, no 
chance system enters into the selection of the sam- 
pling elements. The sample is restricted to units 
believed by someone to be particularly typical of 
the population or are chosen for their convenience, 
e.g., ‘‘grabbing a handful’’ or takinga‘‘chunk”’ 
out of the population. Such samples vary greatly 
in actual and apparent trustworthiness. This type 
of sampling makes impossible the measurement of 
the precision of the sample results from the sam- 
ple itself. The probability that an individual is in- 
cluded in the sample is unknown. No objective 
basis is known for measuring the confidence that 
can be placed in the estimates provided by sucha 
sample. 

Sampling and Non-Sampling Errors—We speak 
of sampling procedure as unbiased if the mean of 
the frequency distribution of the estimates that it 
produces is exactly equal to the population param- 
eter that is being estimated. 

By sampling error or the precision of asample 
result is meant how closely we can reproduce 
from a sample the results which could be obtained 
if a complete count of the population were made 
under the same conditions. The difference be- 
tween the sample result and the true value (popu- 
lation parameter) is called the accuracy of the 
sample survey. We are the most interested in the 
accuracy but it is the precision that is mostfre- 
quently measured. The statistician aims to set up 
the sample design such that the combined effect 
of the accuracy and precision will be ata minimum. 

As a working basis, it is often stated that the 
effect of bias on the accuracy of an estimate may 
be taken as negligible if the bias is less than one- 
tenth of the standard deviation of the estimate. 
The standard deviation of an estimate as calculat- 
ed from the sample does not contain the contribu- 
tion of the bias. However, any biased method 
must be interpreted with caution. There may al- 
so be bias in the estimate, which is unsuspected. 
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It is not possible to anticipate in any specific 
circumstance the magnitude of the error present 
in the estimate made. This determination would 
necessitate a knowledge of the population value. 
The standard method in statistical theory by which 
precision is assessed is that of investigating the 
frequency distribution generated for the estimate 
by repeated sampling from the same population. 
A useful simplification results from the assump- 
tion often observed in practice that the sample es- 
timates are approximately normally distributed. 
Accordingly, a measure of sampling error is ob- 
tained by calculating the sampling variance of the 
estimate the reciprocal of which provides a meas- 
ure of its precision. 

Much of sampling theory is concerned with the 
derivation of formulas for the sampling variances 
of estimates secured by a variety of sample de- 
signs and procedures. The investigator has to be 
on guard in the use of formulas presumably used 
to measure precision without taking into account 
the method by which the sample was selected. Ap- 
propriate selection is based on two fundamental 
bases: (1) formulas measuring sampling errors 
should be based on information of the probability 
of an individual being included in the sample, and 
(2) such formulas are contingent upon the particu- 
lar sampling design applied. 

Among non-sampling errors in surveys are 
(1) errors due to non-response resulting from 
lack of measurement of sampling units in the sam- 
ple due to inability to locate some individuals or 
their unwillingness to answer when found, (2) er- 
rors of measurement resulting from unreliability 
and low validity in information provided, and (3) 
errors in processing, editing, and tabulation of 
the sample results. 

These types of errors as well as some other 
sources demand much consideration. Most of the 
work on sampling theory has related to ways of re- 
ducing sampling errors. However, a beginning 
has been made in meeting other types of diffi- 
culties, some of which require necessary modifi- 
cation in the design and conduct of the investiga- 
tion. Weshall elaborate a little more fully on the 
types of errors arising from the fallibility of hu- 
man observers. 

Insofar as observational errors originate from 
unconscious bias they may be presumed to con- 
form more or less to the classical theory of er- 
rors, following at least approximately the normal 
distribution. If this is the case, positive and neg- 
ative errors would tend to cancel one another as 
the number of observations is increased. It is 
possible to make special studies of this situation 
by repeating the field observations for the same 
region or by having more than one observer and 
comparing the results. 

In sharp contrast to sampling fluctuations and 
observational errors of the above type, inaccu- 
racies arise from false entries or deliberate fail- 





ure to execute directions including entries put 
down by pure guess. These returns are not amen- 
abie to statistical or probability analysis. 

The margin of error of the final estimate thus 
involves sampling fluctuations, observational er- 
rors falling partly within the scope of the classi- 
cal theory oferrors, andinaccuracies due to false 
entries or gross negligence on the part of the in- 
vestigators or respondents. The latter type may 
also contain other systematic errors. 

While it is practically impossible to bring the 
whole survey enterprise under what we call statis- 
tically controlled conditions by eliminating sys- 
tematic errors, it is possible to provide statisti- 
cal controls for detecting and guarding against 
many recording errors. Such ways, for example, 
would be the conduct of two or more independ- 
ent surveys and the use of interpenetrating sam- 
ples. Acomparisonof the different investigations 
would reveal the magnitude of recording mistakes 
and when more than two sets of records existed it 
would enable unreliable workers to be identified. 

Every effort must be made to obtain complete 
information for every member of the sample. This 
includes such plans as following up a random sam- 
ple of delinquents. Even with this effort, it is us- 
ually impossible tosecure complete coverage, par- 
ticularly in human sampling, since some persons 
cannot be found, others are unable or unwilling to 
respond, and a few mayhave died. It is desirable, 
therefore, to distinguish between the exact popula- 
tion which has been sampled and the population in- 
itially defined inthe plan of the investigation. 
These are called the sampled population and the 
target population, respectively. 

Population and Sampling Unit—An aggregate of 
individuals that possess a common character or 
characteristic may be termed a population. Ina 
sample survey the populations with which we are 
concerned contain a finite number of units. This 
situation differs from the conception of the infinite 
population which plays a dominant role in statisti- 
cal theory. The difference in populations leads to 
different methods used to prove theorems. The 
results are slightly more involved when sampling 
is from a finite rather than an infinite population. 
The differences, however, are seldom important 
for practical purposes. The conditions of an infi- 
nite population is assumed to be fulfilled in prac- 
tice by sampling with replacement. 2 

The population we study may be largeor small, 
but there must be a clearly defined population to 
begin with. What we study is some aspects or 
characteristics of the population. It is the popu- 
lation we wish to characterize from the informa- 
tion obtained from the sample, which is usually a 
small part of the population. We do not carry out 
sampling studies to learn about the properties of 
individuals. 

A population may at times be divided into units 
in a number of ways. For example, we may con- 
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sider a city as comprised of a numberof city 
blocks, as an aggregate of households, or ofa 
number of persons. Since an alteration in the 
type of unit commonly affects both cost and 
precision of the sample, the selection of the best 
type of sampling unit is usually significant in the 
economics of sampling. 

We could select asample of residents of a city 
by compiling a complete list of all the residents 
and then by selecting names of individual resi- 
aents at random from the list. If we prepared a 
list of the blocks of a city (by the help of a map, 
say) and then selected a number of blocks at ran- 
dom, we would obtain a random sample of blocks, 
in each of which there would be one or more resi- 
dents. The sampling unit in this case, namely, 
the block, is sometimes callea an ‘‘area’’ or 
‘‘cluster’’ sampling unit. 

From a knowledge of certain facts, it becomes 
possible to ascertain for a fixed cost or fora 
given accuracy, the optimum type of unit for a 
given investigation. Some techniques are avail- 
able for obtaining information about the optimum 
type of sample unit. For example, the analysis 
of variance can be used in the comparison of the 
precision of large and small sampling units. 


Basic Principles of Sample Surveys 





In order to obtain the kind of impartial, well- 
founded, and systematic knowledge at which the 
sample survey method aims, principles of design 
have been built up. The precision of the results 
procurred from the sample survey is contingent 
not only on the size of the sample but also on 
other aspects of the sample design, such as the 
way the sample is chosen and the process of cal- 
culating the estimates from the survey results. 

Usually there aré a plurality of alternative 
sample designs that might be used in a particular 
problem and a comprehension of alternative de- 
signs including a contrast of their relative effi - 
ciencies is required if an appropriate selection 
is to be made. For this purpose modern sam- 
pling theory is providing powerful tools. The 
principle previously referred to of specified pre- 
cision at minimum cost enters repeatedly in mod- 
ern theory. 

We will now go from our discussion of the gen- 
eral methodology in sampling surveys to certain 
specific sampling designs. 


Sample Designs for Some Common 
Sampling Problems 








In planning sample surveys one proceeas in ac- 
cordance with fundamental principles to fit a sue- 
cific design to a projected investigation. There 
are no general rules leading to the selection of a 
design. Each problematic situation presents its 
own problems. A practical working guide is to 





use the simplest design best meeting the neeus of 
of the inquiry. This «aoes not exclude the use of 
complex designs when these best serve the investi- 
gator’s purpose. The sampling plan should be rep- 
resentative. The plan must include the way in 
which the sample is to be urawn, the relative 
chances for the selection of any two possible sam - 
ples, and the analysis specified which is to be 
used on the sample results. 

We can describe only briefly the main types of 
designs. We do so with the purpose of giving the 
interested reader an insight into sampling survey 
procedures with a minimum of mathematical sym- 
bols and of unexplained technical terms. 

Simple Random Sampling— This is the most ele- 
mentary type of sampling problem. In this design 
every element of the population has an equal 
chance of being included in any sample and when 
the chance is unaffected by the corresponding 
chance for any other element, the process is 
called a random sampling procedure. The result- 
ing samples are ‘‘random samples’’. The term 
‘‘random’’ implies that all possible samples of a 
given size have the same probability. 

In practice it is often difficult to obtain a ran- 
dom selection of elements from a population. It is 
not sufficient that the selection be haphazard. We 
must be certain that the method of selection and 
the values of the variable in the population are un- 
related. Where the population can be enumerated, 
however, it becomes possible to select a random 
sample by use of a table of random numbers. 3 

Since each element of the population hasa 
known (in this case, equal) chance of being includ- 
ed in the sample, it is observed that simple ran- 
dom samples are a special case of probability 
samples. Unless we know something more about 
our population we cannot dobetter than to select a 
simple random sample. It has the unique advan- 
tage that the precision of the estimates can be de- 
termined objectively without making questionable 
assumptions. One main objection to simple ran- 
dom sampling was the cost of carefully designing 
a satisfactory procedure which could be effective- 
ly carried out. This difficulty led to the improve- 
ments in pure random sampling procedures that 
reduced the costs of sampling. 

The introduction of the principle of randomiza- 
tion ana of the analysis of variance as the tech- 
nique of analysis of sample observations has made 
possible the attainment of unbiased estimates of 
the quantities under survey andof determining the 
errors to which the estimates are subject. The 
analysis of variance, through making it possible 
to pool estimates of error and to separate compo- 
nents of error that are not homogeneous, has 
brought a drastic reduction in the number of inde- 
pendent sampling units required to be taken from 
each quantity of sampled material. This has 
made possible the uevelojment of sampling de- 
signs frequently involving samples in two or 
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more stages. 

Stratified Random Sampling— This method of 
sampling utilizes supplementary information to 
obtain greater precision in the sample estimates. 
The population is first divided into sub-popula- 
tions called strata and from each stratum are 
drawn a pre-determined number of observations 
by random sampling, the drawings being made in- 
dependently in the different strata. 

In its simplest form the only necessary re- 
quirement for stratification is that the strata ac- 
tually differ one from another in the mean of the 
characteristic under measurement. If the divi- 
sion of the population into strata does not give 
strata that are homogeneous with respect to the 
characteristic under measurement, no gain will 
result from stratification. The basic idea in this 
method is that it may be possible to break down a 
heterogeneous population into relatively homogen- 
eous strata. A precise estimate can be obtained 
for each stratum and the several estimates can 
be combined into a precise estimate for the total 
population. 

The main part of the theory of stratified ran- 
dom sampling is concerned with the properties of 
the estimates obtained from this method and with 
the optimum choice of the sample size for the sev- 
eral strata. Proportionate stratified sampling 
employs a uniform fraction in the sample from 
each stratum. This procedure gives a self- 
weighting sample. 

It is not necessary, however, that the same 
proportion be taken from each of the strata. A 
fundamental theorem gives results leading to the 
optimum location of the sampling elements in the 
several strata. This theorem states that the sam- 
ple size ina stratum should be proportional to 
the product of the size and standard deviation of 
the stratum.4 A more general procedure is to 
relate optimum allocation to a stated total cost. 
However, since differences incost affect the allo- 
cation of the sample merely proportionally to the 
square root of the relative costs per unit con- 
cerned, small differences in cost have very little 
significance. Little will be gained by introducing 
a cost function unless, say, the cost differences 
between strata involve a magnitude of three or 
more. Even with small differences between 
strata in standard deviations, if the cost differ- 
ences are substantial, itis advisable to introduce 
differential allocation of sampling elements to the 
strata. 

It is to be noted that a larger sample is re- 
quired in a variable stratum. If we can arrange 
the units of the population into strata such that 
there will be a larger variance between strata 
and small average variance within strata, there 
is a considerable gain in precision by stratifica- 
tion. This possibly depends on how adequately 
the strata have been defined. 

In determining the boundaries of strata, effec- 








tive use should be made of every kind of informa- 
tion helpful in allocation elements of the population 

into groups differing from one another in regard 

to the character under measurement or to the ex- 
pense of collecting data. But, of course, within 

each stratum, the sample has to be a probability 

sample since judgment must not enter into the se- 
lection of the individual sampling elements. 

Stratification should be regarded as only one of 
the means of sample designs to be taken into con- 
sideration with the aim of increasing the amount 
of information per unit cost. Another design 
which is sometimes more important in reducing 
costs is that of cluster sampling. 

Cluster Sampling—We have previously consid- 
ered several ways in which a population can be di- 
vided into units and have stated that a change in 
the type of unit bears a close relation both to sam- 
pling costs and the precision obtained. 

The term ‘‘elementary unit’’ denotes an individ- 
ual member of the particular population. This is 
the element on which measurements are desired 
and in the aggregate constitute the materials upon 
which the analyses are made, such as the deter- 
mination of averages and percentages. The ele- 
mentary unit is determined by the objectives of the 
survey and depends upon the analysis to be made. 
For example, we may wish to determine the medi- 
an teacher’s salary or the average family income. 
In the former case the individual teacher is the el- 
ementary unit, in the latter the family is the ele- 
mentary unit. The elementary unit is determined 
by the purpose of the survey and not by the sam- 
pling design. 

At times the objective of the survey does not 
necessitate the designation of an el ementary unit 
since only aggregates are to be measured. Again, 
several elementary units may be utilized in the 
same survey. Such is the case where both individ- 
ual and family traits are estimated in the same 
sample survey. 

In cluster sampling one of the leading practical 
problems is to lay out and define the clusters. 
The population of elementary units under consider- 
ation is dividedinto groups or clusters, which are 
the primary sampling units. 

Cluster sampling may involve single or mul- 
tiple stage sampling. In single stage cluster sam- 
pling, asimple random sample is taken of the 
clusters into which the population has been divid- 
ed. Since the cluster is comprised of a cluster of 
the units of observations, it is not necessary to 
measure all of the units that make up the sampling 
unit. We may, therefore, select and measure a 
sample of the elements in any cluster. The term 
‘‘subsampling’’ is sometimes applied to this tech- 
nique since the primary sampling unit (the clus- 
ter) is not measured entirely, but is itself sam- 
pled. The term we apply here is two-stage sam- 
pling since the sample is taken in two steps. 
This type of sampling design involves the follow- 
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ing procedures: 


1. The primary units are selected by simple 
random sampling. 

2. The second-stage units are chosen by sim- 
ple random sampling from the primary units in 
number 1 above. 

3. Uniform fractions are applied in number 2 
to all the primary sampling units selected. 


The processes can be extended to any number 
of stages. Each unit that falls into the sample at 
any particular stage is subdivided into units in 
preparation for the next stage. Thus, in three- 
stage sampling there will be primary, secondary, 
and tertiary units. Sometimes four stages are 
used and occasionally more. An example of a 
four-stage cluster sampling would be: first stage, 
sampling of the 87 counties in Minnesota; second 
stage, sampling of the cities or towns within the 
counties; third stage, sampling of schools within 
counties; fourth stage, sampling of the classes 
within the schools. 

In multi-stage sampling, there must be a 
frame (roster) or one must be constructed for 
every sampling unit that is to be sampled. Thus, 
to start with, there must be a frame that de- 
scribes all the primary units in the population. 
Then, for every primary unit that falls into the 
sample there must be aframe (or one must be 
compiled) that describes the secondary units. 
For each secondary unit that falls into the sample, 
there must be a frame that describes the tertiary 
units, and so forth. At each successive stage in 
the sequence, the sampling units become smaller 
and the frames become more and more detailed. 
In the last stage, the frame is comprised of the 
ultimate units, which might be single households, 
several successive households, individual per- 
sons, or small areas. 

Technically, one of the main economics of mul- 
tiple stage sampling is that the compilation of the 
frame for the next stage is necessary only for the 
units that have already been drawn into the sam- 
ple. The subunits within any larger unit have to 
be exhaustive; inthe aggregate they must account 
for all of the larger units, so that every person, 
every school, every business, and so on is one 
and only one subunit at any given stage. If this 
is not true, the probability of any one person, 
school, farm, etc., being included in the sample 
will not correspond with the assumptions on which 
the mathematical theory is established. Hence 
the estimates of the errors of random sampling 
will be invalidated. 

A cluster sample almost always has a larger 
sampling error than a simple random sample of 
the same size. This is due to the fact that in 
sampling groups or aggregates of individuals of 
the population there is usually a positive intra- 
class correlation of the variable withinthe 





groups under investigation. 

The sampling unit practically obtainable in ed- 
ucational and psychological research is often the 
class, grade, or some other grouping of individu- 
al pupils. Cluster sampling is an extremely valu- 
able method, therefore, in educational and psycho- 
logical research. It is necessary in using it to 
know the conditions and means by which statistical 
estimates and the measures of their sampling er- 
ror may be accurately ascertained. 

The principal advantage of cluster sampling and 
of various multi-stage sampling designs in the 
field survey is in the reduction in travel time. 
However, here as elsewhere the decision of wheth- 
er to use this sampling plan will depend on the rel- 
ative costs and precision to be obtained. 

‘‘Area sampling’ is a method of sampling 
which makes use of such means as a Clearly de- 
fined map or an aerial photograph of sampling 
units of small or large areas, as the case may be, 
in a particular region, when a definite numbering 
or a list of the sampling units is not available. 
The sampling unit may be the farm in surveys of 
acreage of crops or the individual dwelling units 
in certain social surveys. Here, neither the ident- 
ity of individual farms or dwelling units nor their 
numbers in the areas need be known in advance. 
But after having obtained the may or the areas pho- 
tographed, we could adopt a numbering procedure 
which could make possible the drawing of a ran- 
dom or probability sample. Further, the map or 
the aerial photograph could also be used for the 
choice of appropriate sampling units. 

Systematic sampling to be discussed inthe next 
section is a particular case of cluster sampling 
where the sample is a single cluster. 

Systematic Sampling—A convenient form of 
sampling is that which consists of taking the sam- 
pling units from a list of all the sampling units of 
the population. The term ‘‘frame’’ has been used 
by the U.N. subcommission on sampling to speci- 
fy this form of listing of the population. Since 
about 1944 there has been considerable theory de- 
veloped concerning this form of sampling, which 
has come tobe knownas systematic sampling. Ex- 
amples of aframe would be a listing of all the 
rural schools ina state or inthe nation, or ina 
large manufacturing plant the complete list of em- 
ployees may be available in card files containing 
characteristics for each employee on an individual 
card. 

Thus it might be of interest to know the propor- 
tion of employees who hadbeen graduated from 
high school. To carry out a sampling study one 
could take a sample, say, of 100 cardsfrom afile 
comprised of 1000 cards. The first card chosen 
would be determined by giving each of the first 10 
cards a number, these numbers placed on uniform 
pieces of cardboard and then put in a hat and thor- 
oughly mixed. One piece of cardboard would then 
be selected at random. This operation is spoken 
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of as making the first entry at random. Then 

one would proceed by taking the card in the posi- 
tion specified by the number and every tenth card 
thereafter until the sample of 100 had been taken. 
For illustration, if the number of the cardboard 

drawn from the hat were 8, the respective cards 
chosen from the file would be 8, 18, 28, etc. 

More generally, let us consider a finite popu- 
lation made up of the elements x,, X2.... Xnk 
where n and k are integers. A systematic sam- 
ple is obtained by choosing an element at random 
from the elements x,,....,X_, and then selecting 
every k“ consecutive element, i.e., ifx, is the ele- 
ment first chosen, the systematic sample com - 
prises the elements Xj, Xj+k,-.-,Xi+(n-1)k- 

Systematic sampling has found substantial us- 
ages in practice since it is frequently easier to 
select and administer than is a random or strati- 
fied random sample. This is particularly true 
if the drawing of the sample occurs in the field. 
Then, too, this method possesses a certain intu- 
itive appeal in that the sample is spread evenly 
over the population. 

Possbily a more pertinent parallelism is that 
between systematic sampling and the stratified 
random sample having one element per stratum. 
In the latter method, the population is subdivided 
into n strata: (x,,...,Xk), (Xk+1,---,X2k)---and 
one sampling unit selected independently at ran- 
dom from each of the strata. The difference is 
that in systematic sampling the units all come 
at the same relative position in the stratum while 
in stratified random sampling, the position of 
the element in the stratum is designated sepa- 
rately by randomization within each stratum. 

A systematic sample is easier to obtain than 
a simple random sample and it is also likely to 
be more accurate in case there is a trend in the 
observed values which follows the order of the 
numbering. As in simple random sampling, sys- 
tematic sampling gives every unit in the popula- 
tion an equal chance of being included in the sam- 
ple. If the units of the sequence are ordered at 
random with reference to the characteristic be- 
ing studied, the mean of a systematic sample 
will possess the same precision as the mean of 
a simple random sample. However, if the units 
vary at random withinthe strata and significant 
differences between the strata occur, thenthe 
systematic sample will give a mean of higher 
precision than the simple random sample. In 
general, the variance of the mean of a system- 
atic sample is dependent upon the correlation be- 
tween successive elements. Since this correla- 
tion is generally unknown, the investigator must, 
before he decides to use systematic sampling, 
study what patterns of variation the sampling 
units chosen usually give. 

The largest reduction in variance takes place 
when a high correlation exists between adjacent 
units on the frame with respect to the traits un- 





der measurement and when the serial correlation 

decreases with increase of the interval between 

units. Serial correlations are frequently found in 

situations where observations vary with time. Ex- 
amples are amount of fatigue of children at differ- 
ent hours, prices of stocks on different days, and 

temperatures at different times of day. Another 

such phenomenon is variation in plant growth on 
areas of soil with differing fertilities. 

A principal defect in systematic sampling is 
that there is no formula for the sampling results, 
which is generally validfor the sampling error of 
the estimate. Various approximations are avail- 
able although these commonly give overestimates. 
The random start systematic sample estimate of 
the population mean is unbiased. 

A consistent estimate of the variance cannot be 
obtained from a systematic sample selected with 
a single random start. Some approximate esti- 
mates are, however, very useful for survey re- 
sults, when periodicities do not occur and serial 
correlations are not high between nearby units in 
the order of picking. 

Perhaps the biggest risk in using systematic 
sampling is with data that are periodic with re- 
spect to the order of the listing in the frame, that 
is, if the interval between units equals the period 
or some multiple of it. This danger in systemat- 
ic sampling from a population with periodicities 
is particularly great since the sample itself may 
afford no evidence of the periodicity. 

Since both substantial advantages and consider- 
able losses may sometime occur in using system - 
atic sampling the research worker needs to know 
situations when this method may result in materi - 
ally larger or substantially smaller sampling var- 
iances than would alternative methods of random 
selection. 


More Complex Sample Designs 





Up to now we have described the more com- 
monly known and used sampling designs: simple 
random sampling, stratified random sampling, 
cluster sampling, and systematic sampling as dis- 
crete sample designs. Often these designs are 
sufficient for the problem in hand. There is a 
class of problems, however, for which a consider - 
ably better solution can be gained by the use of 
combinations of these (and other) methods and by 
different estimating processes. One stage of the 
same sampling problem may involve one design 
followed by another and so on for several stages. 
Different kinds of probability samples are drawn 
at each stage. Different types of estimates may 
be used. 5 

In addition to the construction of designs, there 
are certain tools other than those mentioned pre- 
viously, which can result in a substantially better 
job of sampling. We can only mention examples 
of these briefly. 





JOHNSON 175 


Sometimes improved results are attained by ple. The belief is that since the controls have av- 
the choice of more efficient estimators. For ex- erages in the sample the same as those in the pop- 
ample, a method of double sampling might be ulation, the means of the investigated variables, 
used. This method involves two sampling inves- assumed to be positively correlated with the con- 
tigations. The first consists in drawing a large trols, will accordingly be better estimated. The 
unrestricted sample from the population deter- distinguishing feature of this sample plan, then, is 
mining for each sampling unit, the value of the the restriction of the sampling to the part of the 
character, the collection of information on which population picked on the basis of the control aver- 
is easy and relatively inexpensive. This second- ages. The variability of the known quantitative 
ary character is known to be highly correlated characteristic(s), as well as of the other charac- 
with the primary character with which the inves- ters closely correlated with it, will clearly be con- 
tigation is concerned. The collection of data on siderably less than the real variability in the popu- 
primary character is costly. lation. 

The second investigation consists in drawing A variant of the above plan is observed in the 
a small sample in which the values of both the attempt to get a “‘perfect cross-section’’ from a 
primary and secondary characteristics are ascer- sample with the last census on certain characteris- 
tained. It now becomes possible to find the re- tics. Thus, it is possible to make up a ‘‘sample’”’ 
gression of the primary on the secondary charac- of persons by adding and subtracting individuals so 
ter. The predicted value in the regression equa- that finally the sample corresponds almost exactly 
tion corresponding to the difference in the mean with the last census on, say, age-groups, sex, edu- 
values of the secondary character in the two sam- cation, economic status, and others. This sam- 
ples is then used as an estimate of the mean val- pling plan is very hazardous since it may fail al- 
ue of the primary character in the total popula-- most completely to agree with the population that it 
tion. was designated to represent with respect to the 

Another example of a tool leading to an im- characteristics the survey was contemp| ated to 
proved estimate is to permit sampling units to measure. 
be drawn with arbitrary probabilities. Where it Advocates of purposive sampling claim as ad- 
becomes possible to find a basis for assigning vantages for itthat itis sometimes possible to use 
the arbitrary probabilities it is possible to make this method where randomization is not possible 
very substantial improvement in the efficiency of and that the enumeration covering selected areas 
asample. To obtain unbiased estimates with un- or districts would be less expensive. The main dis- 
equal sampling probabilities, the sampling ele- advantages of the method are that (1) substantial in- 
ments are weighted by the reciprocal of the prob- formation of the population must be had in advance 
ability. of the sample, (2) thecontrols used are frequently 

We shall conclude this discussion of sampling defective, and (3) the methodis not amenable to the 
designs by describing briefly two sampling de- development of a sampling theory since it includes 
signs still in rather common use. These designs no element of random sampling. One cannot obtain 
have the semblance but not the substance of prob- from the sample itself an objective measure of the 
ability sampling designs. precision of the sample estimates. 

Quota Sampling—This method is a variant of 
Quasi- Representative Sampling Plans purposive selection. As it is used in practice by 
a number of agencies, interviewers are given as- 
Purposive Sampling—In purposive sampling signed quotas of people of different age-groups, 
some preliminary segregation of certain parts of socio-economic status, etc., and are instructed 
the population is made and the sampling is re- to secure the specified number of interviews in 
stricted tothese parts. These segregated parts each quota. Added directions proposed to avoid 
are somtimes those thought by the investigator excessively unrepresentative selections within the 
or by some solicited authority or expert to be allotted quotas are sometimes given. Thus an in- 
typical of the population, for instance, there may terviewer may be asked to secure twelve interviews 
be judgment selection with respect to certain with housewives, who are not emp] oyed full time, 
characteristics of the population of typical or who own their own houses, etc. The enumerator 
‘‘representative’’ counties, cities, schools, indi- is instructed to continue sampling until the re- 
vidual households, blocks and so forth. At other quired ‘‘quota’’ has been secured in each stratum. 
times selection is made because of convenience, The interviewer does not select the interviewees 
that is, any group that might be handy, such as a at random. He may take advantage of any knowl- 
class of students. edge that enables him to fill his quota quickly. Vary- 

A more objective method of purposive selec - ing amounts of latitudes are allowed the interview- 
tion is restricted to sampling aggregates which er. The interviews are not often carried out by 
have the same average as the population with re- house-to-house canvas but may at times be done 
spect to one or more controls. It is assumed by interviewing in streets or other public places, 
that the entire aggregate should make up the sam- or even now and then by telephone. 
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The objective is toget the advantages of strat- 
ification without the higher field costs resulting 
from selecting units at random. It is evident 
that, however accurately the quotas are met, 
such samples do not constitute random samples. 
Consequently, sampling theory cannot be applied 
to quota sampling. The accuracy of the results 
must be based on assumptions and judgments that 
cannot be estimated objectively. Accordingly, 
information cannot be obtained from the sample 
results toassay their precision. These methods 
often deal with collecting opinions and their re- 
puted success is likely due to the fact that the 
validity of opinions are rarely, if ever, tested. 

Accordingly, the danger of bias always exists 
and the quota method has to be ruled out as an 
appropriate method of investigation for precise 
inquiries where unbiased results are indispen- 
sable. 

It should not be inferred that since the pur- 
posive and quota sampling methods are types of 
judgment samples, prior knowledge and judgment 
do not enter in the design of probability samples. 
Knowledge and judment are made use of ina 
number of ways, for example, in defining the 
kind and size of units of sampling, in laying out 
homogeneous and heterogeneous areas, and in 
reduction of sampling error by classifying sam- 
pling elements into strata in an appropriate way. 
The point is, however, that this information and 
processes are not permitted to influence the final 
selection of the particular sampling items that 
are to comprise the sample. The final selec- 
tions must be automatic, that is, by random pro- 
cesses, beyond the control of the investigator. It 
is only by this safeguard that the bias of selec- 
tion is eliminated and the magnitude of the sam- 
pling error measurable and controllable. 


FOOTNOTES 


1. Presidential address on ‘‘The U. N. Subcom- 
mission on Statistical Sampling’’ at the ses- 





sion on sampling, International Statistical In- 
stitute, Berne, September 1949. 


. This process of sampling will never exhaust 
the population. Where a continuous mathemat- 
ical function, e.g., the normal curve is used 
to represent the observations, the effect is to 
replace a finite by an infinite population. 


. A Million Random Digits with 100,000 Nor mal 
Deviates (Glencoe, Ill.: Rand Corporation, the 
Free Press, 1955), 600 pp. 





. For an illustration of the application of this the- 
orem, see: Palmer O. Johnson, Statistical 
Methods in Research (New York: Prentice-Hall, 


. For an example of more complex sample de- 
signs, see Palmer O. Johnson and M. S. Rao, 
Modern Sampling Designs: Theory, Practice, 

Experimentation (Minneapolis: University 
of Minnesota Press, 1959), 100 pp. 
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TABLES FOR TRANSMUTATION OF ORDERS 
OF MERIT INTO NORMAL EQUIVALENTS 


KENNETH E. ANDERSON 
University of Kansas 
E. L. BARNHART* 
Kansas State Teachers College 
Emporia, Kansas 


TABLES! PUBLISHED in 1954 were adapted 
from a table presented by C. L. Hull? in 1922 for 
the purpose of changing orders of merit, or ranks, 
into normalized scores. The tables were devel- 
oped to obviate computing the percent positions of 
the individuals of a group when it was desired to 
find their normalized scores. Inits original 
form, Hull’s table contained corresponding values 
of ‘‘percent position’’ and normalizedscores. The 
‘‘percent position’’ was defined as 


100 (R - . 5) 
— 


where R is the rank of the individual in the series 
and N is the number of individuals ranked. By 
means of this table, then, it was possible to pro- 
vide a set of normalized scores on a given charac- 
teristic for a group of individuals byfirst ranking 
them on thecharacteristic, thentransforming the 
ranks into percent positions by the formula, and 
finally obtaining from the table the corresponding 
normally distributed scores. 

The tables published in 1954 as adapted from 
Hull were based on a range of ranked ability arbi- 
trarily cut off ata plus and minus 2.5 standard 
deviations. The baseline of his curve was 5 stand- 
ard deviations and eachof the 100 parts was equal 
to 0.05 standard deviations. In order to avoid 
this shortened range of ability, Tables I through 
VIII were developed in terms of the unit normal 
curve. They contain the normal equivalents (T 
scores) corresponding to every rank in groups of 
all sizes from 1 to 100 individuals. In order to 





find the normalized score for a given individual it 
is necessary Only to find the table column corres- 
ponding to the number of individuals in the group 
and the table row corresponding to the rank of the 
individual inthe group. Thescore will lie at their 
intersection. For example, suppose an individual 
ranks 8th in a group of 35 persons with respect to 
a given characteristic. Locating the table column 
corresponding to ‘‘size of class’’ equal to 35 and 
the table row corresponding to ‘‘rank in class’’ 
equal to 8, we find a value of 58 at their intersec- 
tion. This value is the score, out of a possible 
100, which would theoretically be made by the 8th 
ranked individual in a group of 35, if the scores 
were normally distributed. 

The top portion of Table I gives the normal 
equivalents of ranks in groups of all sizes from 1 
to 25, where 


= 10 (X - M) 
oo ae 
Thus, a rank of 1 in 25 has a percent position of: 


Pp = 


100(R - .5) _ 
—y = 2.00 


Referring to the unit normal curve, we obtain a 
x/o of 2.05. Thus the normalized equivalent of a 
rank of 1 in 25 is: 

T = 50 + 10(2.05) = 70.5 or 71. 


The process used to obtain the normal equiva- 


Formerly Assistant in Statistical Laboratory, School of Education, University of Kansas. 


. Kenneth E. Anderson and others. ‘‘Tables for Transmutation of Orders of Merit into Units of Amount 
or Scores,’’ Journal of Experimental Education, XXII (March 1954), pp. 247-55. 





. C. L. Hull. ‘‘The Computation of Pearson’s r from Ranked Data,’’ Journal of Applied Psychology, 


VI (1922), pp. 385-90. 
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lents is illustrated below for a group size of 31. The same constant was subtracted from each suc- 

The percent position for a rank of 1 in 31 was ob- ceeding percent position to obtain the next percent 

tained as follows: position. Having obtained the percent positions, 
the x/o values were obtained from a unit normal 


100(1 - .5) _ 1. 612903 table, and then converted to T scores by using the 
=r : 


P= i following formula: 


100.00 - 1.612903 = 98. 387097 T = 50 + 10(x/o) 


% Position x/o T Score 


The percent position for a rank of 2 in 31 could 
be obtained by the same process. However, the 98. 387097 2.14 71.4 
following constant was used to obtain the percent 95. 161291 1. 66 66. 6 
position: 91. 935485 1.40 64.0 


88. 709679 1.21 62.1 
100(1/31) = 3.225806 85. 483873 





98.387097 - 3.225806 = 95. 161291 





JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 27, March 1959) 


ANALYSIS OF COMPLEX CONTINGENCY DATA’ 


CYRIL J. HOYT, P. R. KRISHNAIAH, E. PAUL TORRANCE** 
University of Minnesota 


THE METHODS of multiple ana partial corre- 
lation and regression have been used in the inves- 
tigation of numerous problems in education and 
psychology. These methods, when used proper- 
ly, have provided research workers with an ade- 
quate means of investigating problems of the in- 
terdependence among variables that are meas- 
ured on interval or ratio scales. In many cases, 
however, the variables considered are measured 
in terms of nominal scales by classifying data 
into categories. Under these circumstances, 
contingency tables are often developed. In many 
instances the contingency data are classified on 
more than two bases, thus giving rise to com- 
plex contingency tables which have dimensional - 
ity of three or more. 

Part II of this paper shows the derivation of 
maximum likelihood estimates of probabilities 
that are used for testing certain hypotheses anal - 
ogous to those involved in problems of partial 
and multiple regression. Part I of this paper 
gives a four dimensional illustrative contingency 
table and shows how to apply the procedure for 
testing a number of hypotheses. 


PART I 


The data given in Table I consist of the fre- 
quencies in each sub-category of a four-way 
classification of the April 1939 status of 13, 968 
Minnesota High School! graduates of June 1938. 
The four variates considered are (a) position by 
thirds in high-school graduating class, (b) post 
high-school status in April 1939, (c) sex, and 
(d) father’s occupational level in seven categor- 
ies. The statistical procedure discussed in Part 
II of this paper is used to test certain hypotheses 
concerning the dependence of post high-school 
status on the other three variables. This paper 
does not contain a complete analysis but does in- 
clude sufficient tests to indicate the usefulness 
of the procedure. 

The hypothesis analogous to the test of the sig- 
nificance of the multiple regression of post high- 
school status on the other three variables is des- 
ignated as Hj. Thus Hj may be stated: post 
high-school status is independent of high-school 
graduation class-rank, sex and paternal occupa- 





*All footnotes will be found at end of article. 





pational level. If ig designates post high-school 
status, ij, ig and i4 indicate high-school gradua- 
tion rank, sex and paternal occupational level re- 
spectively, the probability of an observation falling 
in the (ij, ig, ig, ig)th cell may be designated as 
Piz igigig- Marginal and submarginal probabili- 


ties are designated by replacing one or more of 
the i subscripts by zeros. Thus, for example, 
Poig00 indicates the four probabilities that an ob- 


servation falls in a specifieu (ig = 1,2,3 or 4) cat- 
egory of post high school status. That is, the sub- 
script ‘‘o’’ indicates that all categories on that 
particular variable are summed. In terms of 
these symbols Hj may be designated as pj) igigig 
= Poigoo Pi Oigig for all ijigigig 

Part II shows that the maximum likelihood esti- 
mates of these p’s are obtained by taking the ra- 
tios of the corresponding marginal totals or sub- 
totals to the total number of observations in the 
table. Thus the chi-square test statistic appro- 
priate for testing Hy is: 


n2 
ijigigig 
np: * ° . ~ 
Pizigigig 
where Bi igigig - Boigoo Pi oigig 


Noi 900 


= . = 
Poig00 = — 


. "iz Oigig . 
and Bi oigig = — by (1) of Part Il 


for all values of ijigig andi4. 


In this example (computation notes are given 
in Part III) the value of Xi = 2838. This value is 


interpreted as a X? with 123 degrees of freedom. 
Part II indicates the degrees of freedom are de- 
termined by the formula (mg - 1)(mj m3 mq - 1) 
which in this case is 3{ (3)(2)(7)- 1] or 123. Thus 
H, is rejected and the conclusion drawn, thatthe 
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multiple dependence of post high-school status on 
the other three variables is significant. 

Further investigation of this dependence may 
suggest the testing of such hypotheses as Hjq and 
/or Hjp- Hila concerns the dependence of post 
high-school status on high-school graduation rank 
for girls in highest paternal occupational level 
while Hjp concerns the dependence of post high- 
school status on paternal occupational level for 
boys in the highest third of their graduating class. 
These hypotheses are tested with ordinary two- 
way chi-square tests. Table II gives the fre- 
quency table for testing Hjg and Table III gives 
that for testing Hip. Both of these hypotheses 
are also rejected since X{, = 77 and Xjp = 245 
where d.f. are 6 and 18 respectively. 

Other hypotheses of this type or of amore com- 
plex type can be tested for the further investiga- 
tion of the multiple regression of high-school sta- 
tus on the other three variables. See also H4 and 
H5 below. 

Hg is another hypothesis which, in some in- 
stances, would beof preliminary interest in study- 
ing the mutual interdependence among a set of 
variables. That is, Hg states that the four var- 
iables, high-school rank, post high-school status, 
sex and paternal occupational level are mutually 
independent. In terms of the notation described 
above 


H2: Pisigigig = Piyooo Poigoo Pooigo Poooig fF 
all iy, ig, ig, ig 

The maximum likelihood estimates of Pizigigig 

are obtained by taking the products of estimates 


Bi,000 == (Mj,000) for all values of ij 


Poigoo = (Noi 900) for all values of ig 


n 


Booi3o _ : (Noi 30) for all values of i3 


Bocoi, == (Menai for all values of i 
Poooig “> (Noooi4) 4 
which are obtained by using (3) of Part II. 

The appropriate test statistic fortesting Hg is 


again of the form X{ where the Pp are defined 
above. 


2 
Mizigigig 


g= D> LEE 


iy ig ig ig NPisigizig 


The value of X5 for the data in Table I is 3812 
which is found to be significant for X? with 155 
degrees of freedom. Thus the hypothesis of mu- 
tual independence of the four variables is refuted. 
The number of degrees of freedom is determined 
as 





(m,)(mg)(mg)(m,4) - (m, + Mo + Mg + m 4) +4-] 


Hg illustrates a third type of hypothesis which 
may be of interest in some cases. Symbolic nota- 
tion of Hg is 


H3 = Pi sigigig ~ Pizigoo Pooigig fF 4! tyizigiy 


This hypothesis is stated: High-school graduation 
rank and post high-school status are independent 
of sex and paternal occupation level. Part II shows 
that the maximum likelihood estimates required 
for testing this hypothesis and gives a formula for 
X3 exactly like that for xj given above using the 
following values for Bi sigigig: 


a 1 ‘ 
Pj inieia = = (n; : ) (Noi i ) by using (4) of 
oe eee ee Part Il. 


For the data in Table I, the X3 for testing Hg has 
a value of 2420 whichis significant for 143 degrees 
of freedom. Thus, a student’s high-school rank 
and post high-school status is related to his sex 
and paternal occupational level. The general form 
for the number of degrees of freedom for Hg is 


my; Mg mg mg - MyMo - mgmy4 + 1 


H, and Hs are hypotheses of the type analogous 
to the significance of partial regression coeffi- 
cients. These are concerned with a consideration 
of the dependence of other variables while holding 
paternal occupational level constant. Thus H4 
may be stated, when paternal occupational level is 
held constant, high-school rank, post high-school 
status and sex are mutually independent. H5 may 
be stated, when paternal occupational level is held 
constant, post high-school status is independent 
of high-school rank and sex. Hypothesis Hj,q tests 
a hypothesis more limited than Hs for girls in the 
highest level of paternal occupational class. 


Pi, 0014 Poigoig Pooigiq for all 





me ates * Poooig 


ij, ig, ig, ig 


The maximum likelihood estimates of Pisinigiy 


are 
- Nj 0014 Moigdig Mooigi, ’ . 
Bit izigig * 2 ny 
1 Noooig 





iy» 12, 13, ty 
With the above estimates of Pisigizig substitut ed 


in the formula given for Xj, the test statistic ap- 
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TABLE II 


FREQUENCY FOR EACH HIGH-SCHOOL RANK X POST HIGH- 
SCHOOL STATUS, FOR GIRLS IN HIGHEST PATERNAL 
OCCUPATIONAL LEVEL 





High-School Rank 





Lowest Middle Upper 
Post High-School Status Third Third Third 





In College 53 163 309 
In Collegiate School 7 30 17 
Employed Full-time 13 28 38 


Other 89 





TABLE Ill 


FREQUENCY FOR EACH POST HIGH-SCHOOL STATUS X PATERNAL OCCUPATIONAL 
LEVEL FOR BOYS IN UPPER THIRD OF GRADUATING CLASS 





Post High-School Status 





Paternal Occupational 
Level 
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propriate for testing Hq is obtained. 
Pi,Oigig Poigoi 

ye seg 2 0t20"4 for all izigigig 
Poooi4 





H5: Pijigigig = 


The test statistic Xé for testing H5 is obtained 
from Xj by replacing 
a . 

1) Oigi, Oig0ig 


n Noooi, 





Pi igigig With Pijinigig = 
for all iy, ig, ig, ig 


For the data in Table I, X4 = 107,813 and has 
119 degrees of freedom while X5 = 1374 and has 
105 degrees of freedom. Thus, the test of Hq in- 
dicates that three other variables are not mutual- 
ly independent for fixed paternal occupational lev- 
el while the test of Hs indicates that post high-school 
status depends on high school rank and sex for 
fixed paternal occupational level. 

The particular hypotheses which were tested 
in this illustrative example were selected with the 
aim toward showing the variety of tests that can 
be made with the X* statistic discussed in Part II. 
In any particular research problem the hypotheses 
tested will depend upon the questions under inves- 
tigation. For example, a thorough study of the 
partial dependence of post high-school status on 
the other variables would proceed systematically 
by testing other hypotheses than Hq and H5 by 
holding constant other variates singly, in pairs 
and in triplets. In this case, however, there is 
abundant evidence of the dependence of post high- 
school status on all of the other variables. 


PART IJ 


The statistical theory basic to this type of anal - 
ysis of complex contingency data was ext ended 
for the three-way table by Roy and Mitra.2 The 
work below shows how this theory can be applied 
to a k-way table. The work of Roy and Mitra is 
further used to show that the test statistic appro- 
priate for testing each hypothesis is distributed 
as Chi Square asymtotically. 

Consider a multiway table when each dimen- 
sion represents a variate ij, ig, Also, 
let ij = a 2,.--Mj where j= 1, 2,...k. Let n, 

ix? Pizig i, respectively denote the 


total number of observations in all cells, number 
of observations in (ij,...i,)th cell and the proba- 
bility of an observation falling in (ij, ... i,)th cell. 
Also, where one or more of the subscript ‘‘i;’’ is 
(or are) replaced by ‘‘o’’ in Nijyig: * *ik indicates 


that Nj;...i, are summed over those variates. 
Similar explanation holds good for Piz... ik also. 





Here Niy... iprg are observed values and Pi;..-ik’s 


are the values inthe population. The Ny... iprg 
are distributed as a multinomial distribution. 


iin n’ coe 
a 1j1-..1 
TMi... i! iia 


where 7 denotes that the product is taken over all 
possible values of (ij... ix). 

Hypothesis of independence between ‘‘i,”’ 
and ‘‘ig... i_’’ 


- Nijoo...0 Moig...i 
>, wad o oan ee konto ..0) (Poig... i 


where 7 dehotes that the product is taken over all 
possible values of (ij...i,k). The p’s are estimat- 
ed by maximum likelihood method. The p’s are es- 
timated subject to the restrictions 21 Pj;o0. ..0 = 
22 Poig...i, = 1 where 21 and 2 respectively de- 
note that the summations run over ‘‘ij’’ and 
“te... ik’’. 

Now consider 

L = 21 Nij00...0 108 Piyoo...0 + ~2 Noig, . . ix 

log Poig... i, + (21 Piyoo...0 - 1) + M 

(22 Poig... ip - 1) 


where X and p are Lagrangian multipliers. Taking 
the partial derivatives of L with respect to 
Pj} 00. . .0, Poig. . . ip» equating the derivatives to 
zero, and solving for Pij00...0 and Poig. . . i, 
gives : 


A Njj00...0 
Siete...0% eo 


n 
oi. - «iy 


Poin... i, ~ - 


The appropriate test statistic for testing Hj is 


2 


Xi - Z(nj,. ook ~ n Pijig. - iy) 


(2) 





MPi,... i, 
where & denotes that the summation runs over all 
possible values of “i... ix’’ and where Dj, io. . . i, 


= Bi,o0...0 Poig. . . ij, and the values of Bij00. ..0 


and Poio. . . i, are given above. Xj is distributed 
as X" with (mj - 1)(m2...m,-1) degrees of free- 
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dom. The hypothesis of complete independence 
between ‘‘ij...i,’’ is 
H2: Piy.. 


. ix = Piyoo...07-~~Poo. . . oi, 


where the p’s are subject to the restrictions 


z Pp; aching == si} 

iy Pijoo...0 ix Poo. . . Oi, 

By following a procedure similar to that used 
above the maximum likelihood estimates are ob- 
tained: 


The appropriate test statistic for H2 is x3 which 
is of precisely the same form as Xj where 


Biyi ig. «+ if = Bijoo... o~~~~Poo. .. Oi, * 


X5 is distributed asymptotically as X* with 
k k 
it m-2, mj +k-1 
degrees of freedom. 
The hypothesis of independence bet ween two 
sets of variates 
H3: Piy.. 


«iy, * Piy... ipo .. Poo... Oipyy---i 


when h<k where 


% Pijig...ijoo...0 = %2 Pooo...oip,y--- ig =} 
and £4, Xg respectively denote that the sum ma- 
tions run over (ij ip) and (ipy) 

By following a procedure similar to that used for 
H1 the following maximum likelihood estimates 
are obtained: 


Niz...ipoo...0o 


- 


Pi,...i,00...0~ n 





a 


Po... 





-Olnyy-- PS | 


The test statistic for Hj is X3 which has the same 
form as xj where 


Miz... ip,,100...0 NOO. .. Oipyy--- i, 
is... ty” n? 








k h 
X§ is distributed as x’ with (re mj - 1) - (7 mj + 
i= i= 


k 
7 mj - 2) degrees of freedom. 
i=h+1 
Hypothesis of independence between ‘‘ij, ig... 
ix-1’’ given iy. 


. . Oi, Poigoo. . . oi,” ~~~Poo 


- - Oip_jix 





Poo. . . Oi, 
where p’s are subject to the restrictions 


D pj 1 = 2 
iy Pij00...0 . Oi, le. sree. « 


Oix-11k = Pooo. . . oi, 


i = 2 Dp: 
1k ig 0ig00.. 
and 2 Pooo. .. oi, = | 

Ik 


By following a procedure similar to that used for 
Hj the maximum likelihood estimates are obtained: 


Njjoo. . . Oi, 





Poo. . . Oix-1 i, = 


The test statistic for Hq is X4 which has the same 
form as Xi where 


Bi 00. . . Oi 77-7 ~Pooo. . 


- Oip_1i, 





Poooo. . . Oi, 


Xj, is distributed as x” with 


k-1 k-1 
my, [ J, mj - z. mj + k-2] degrees of 
freedom. 
Hypothesis of independence between ‘‘ij...ih’’ 
and ‘‘ipj4i---ix-1’’ given ‘‘i,’’. 


. Oi, Pooo. .. Oip,y--- ip_zix 





where p’s are subject to the constraints 


~—- 


~1 Pizig...ipoo...oip = 42 Pooo... oipyi.--- i 
Poo. . . Oi, 2nd ic Poo. .. oi, = 1 


where 2, and 29g respectively denote that the sum- 
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mation runs over all possible values of (i... ip) 

and (ips ix-1). Following a procedure 

similar to that used for Hj the maximum 1ikeli- 
hood estimates are obtained: 


i Niy...ipoo. .. oi 
Pi... ip,00... oi, = n 





* Nooo... Oip4i...i 
Pooo. .. Oip4y-- + ip * n 





-- Oix 





n 


The test statistic for Hs is Xé which has the 
same form as Xj where 


Biy...ipo... oi, Pooo. .. i,44-- +i 





Booo. .. Oi, 


Xé is distributed as X* with 
eC -1 h k-1 
m 7 Mj-7 M-7 
" i=1 . i=1 , i=h+1 
degrees of freedom. 


mj + 1] 


PART Il 


In order to calculate Xj the computational 
formula used is: 





N2: 6 snini 
S[rzrex i1igigig n | 
i2 i1 ig ig Moigoo Mi, 0igig 


z (zzz Mirizisia) — p 
ig Reigoo iy ig ig Niyoigig 

The first step in the calculation consists of 
computing the njjoi3iq Sub-marginal totalsin Ta- 
ble I. This means ha within each sex by high- 
school rank by occupational level the sub-totals 
must be obtained by adding‘over all four categor- 
ies of post high-school status. Beginning in the 
upper left-hand corner, 87,3,17 and 105are add- 
ed to give 212 for njo33- Similarly, 72, 6, 18, 
and 209 are added to give 305 for njoj2. Thus 
for the boys in the upper third of their high school 
class there are seven such sub-totals. Consider- 
ing seven sub-totals for each sex by high-school 
rank classification, 42 such sub-totals must be 
obtained, one for each line of the three vertical 
sub-divisions of Table I. 

The second step consists of calcul ating the 
quantities, nf, isigig/Miyoigig These values 





should be added by going down each of the four col- 
umns representing each of the post high- school 
status categories. Thus for the first post high- 
school status category one adds 


a ua --- + Tags au > 125. , 


for the lowest third on high-school rank. To this 
is added 
2167 | 159° 19? | 25" or 460.1 
352 420 ~ 398 * 207 


2 
for the middle third and ue. + ---- + iy or 
952.7 for the upper third. These three numbers 
are then added together to give 1537.9 which is 
then divided by noloo/n- nis the total sum for 
the whole table or 13, 968 while noloo is the num- 
ber of people enrolled in college, that is, in the 
first class of post high-school status or 3945. 

The third step consists of repeating step two 
for category two of post high-school status. For 
this category one adds 
2 


, - 9? 


aa * 305 + 8 or 5.25 to 4+ “== + 


2 2 


2 14 
or 17.86 and 3or + ---- + Tex OF 21.96. These 


three sums are added to give 45.07 which is mul- 
tiplied by 13, 968 and then divided by 678. 

The fourth and fifth steps carry ona similar 
series of calculations for the third and fourth cat- 
egories of post high-school status. The total for 
category three is 133.3 and for category four is 
5181.6. The divisor for category three is 
1232/13, 968 and that for category four of post 
high-school status is 8113/13, 968. 

From the sum of the four quantities derived in 
steps 2, 3, 4 and 5, nor 13,968 is subtracted to 
give x} = 2838. 


The computation of X§ for testing Hg is be gun 
by calculating the subtotals for each category on 
each of the three bases on which the 13, 968 gradu- 
ates have been classified. It is found that 3694 
graduated in the lowest third of their class, 5584 
in the middle third and 4690 in the upper third. 
Likewise, when the 13,968 youth are classified 
on the variate ig it is found that 3945 were en- 
rolled in college, that is, in the first category of 
post high-school status, 678 incategory two, 1232 
in category three and 8113 incategoryfour. There 
were 6207 boys and 7761 girls. The subtotals for 
the seven occupational levels were: 1826; 2184; 
4464; 2649; 1021; 995 and 829. 

The computational formula for X§ is 
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mn? in? . s 

lyigigi 

SESS = se x a] 
iz igigig Mizo0o Noigo Mooigo Noooi, 





-n 


These sums can be calculated in a number of sys- 
tematic orders. One good system is todivide the 
work into six equivalent parts by doing certain 
computations for each sex by high school rank 
category. If this system is used there are 28 sep- 
arate terms added for each of the six categories. 
For the males in the lowest high school rank cat- 
gory the following terms are calculated: 


2 2 2 
87 ‘ 3 " 17 , 


Then the sum of the above 28 terms is divided by 
the product of 6207 and 3694, the subtotals for 
males and lowest high-school rank category. 

The 28 terms for the males inthe middle third 
are obtained by calculating as follows: 


216? 
rratanas * renders © naeteri * 





The sum of these 28 terms is divided by the pro- 
duct of 6207 and 5584. 

The entries in each of the four other sixths of 
Table I are treated in a similar way. When the 
six quotients have been obtained for the six parts 
of Table I, these quotients are added so that their 
sum may be multiplied by n*. We then subtract 
n from the resultant quantity. 

For calculating X§ the following for mula is 
used: 


a 
, _ “izigigig 
Xg=rxuzrzr — ——- = 1 
iy ig ig ig Mi1i200 Nooigi4 





The subtotals needed for the first factor in the de- 
nominator can be found by adding the twelve col- 
umns in Table I. Thesesums 578, 117, 217, 2782, 
1410, 277, 503, 3394, 1957, 284, 512, and 1937 
are Nis i900 for iy = 1,2,3 and ig = 1,2,3,4. Like- 


wise, the fourteen subtotals need for the second 

factor in the denominator can be obtained by ad- 
ding across each of the fourteen rows of Table I. 
These sums are, 885 for the first line, 1034 for 

the second, and 1797, 1242, 450, 436, 362, 941, 
1150, 2667, 1406, 571, 559, and 467 for the other 

twelve rows. 

’ The calculations can be carried out by going 
down the columns. For the first column the sum 
of 872 72? 52? 9 3° is divid- 

B65 1034 1797 ~"- * 559 * Aer 


ed by 578. For the second column, the same de- 
nominators are used but the divisor of the sum is 
nj200 Or 117. A similar procedure is repeated 
for each of the twelve columns of Table I, using as 
divisor the one of twelve column sums that corre- 
sponds to the column used. 
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DETERMINANTS IN MULTIVARIATE 
CORRELATION’ 


E. MURIEL J. WRIGHT, WINTON H. MANNING, PHILIP H. DUBOIS 
Washington University 
St. Louis, Missouri 


IN THE computation of coefficients of multiple 
correlation and related statistics, several modes 
of attack are available, differing in economy, flex- 
ibility, and interpretability of intermediate steps. 
Among these are: the formula methods of Yule (20), 
which involve partial r’s and partial standard de- 
viations of various orders, but which are practi- 
cal only when the number of variates are few; and 
generalized routines for solving n simple simul- 
taneous linear equations with n unknowns, such 
as the methods described by Aitken (1), Crout (3), 
Dwyer (7,8), Horst (10,11), Waugh (18), and 
Wherry (19); and the method of reduction of cri- 
terion variance recently proposed by DuBois (4, 5), 
which synthesizes a simplification of Yule’s form- 
ulation of the multivariate problem with a matrix 
computing routine. This last method offers con- 
siderable flexibility and economy in cal culation, 
while simultaneously providing that all cell en- 
tries at any stage in the routine are meaningful 
as variances or covariances of higher order resi- 
duals or as beta coefficients. It is our purpose 
to point out relationships between conventional 
methods of solving determinants and this variance- 
covariance procedure. 

Example I illustrates the solution of a multiple 
correlation problem by the DuBois method. We 
begin with a matrix of (n + 1) variates, of which 
n are independent variables, or predictors, with 
the dependent variable or criterion (always found 
in the extreme right-handcolumn) designated ‘‘0’’. 
Each original variate is in z-form with mean of 
zero and unit variance. Entries in the principal 





*All footnotes will appear at end of article. 





uiagonal of the original and successive matrices 

are variances. Those in the remaining cells are 

covariances in z-form, which in the original ma- 
trix are numerically equivalent to r’s. (See Ex- 
ample 1). 

Successive matrices are produced, each with 
one row and one column less than the preceding. 
Every element inthese matrices is a variance or 
covariance of residuals in higher order z-form. 
In any particular instance a residual is the origin- 
al valueless the portion associated with previous- 
ly eliminated variates. The final, single-element 
matrix is the residual variance of the criterion 
after the portions associated with the predictor 
variates have been removed. When the residual 
variance of the criterion is subtracted from unity, 
the square of the coefficient of multiple correla- 
tion is obtained. Any higher order partial covari- 
ance may be transformed into a partial correla- 
tion by dividing it by the appropriate partial stand- 
ard deviations. Part correlations may be calcu- 
lated by dividing a partial covariance by a single 
partial standard deviation. The complete set of 
beta coefficients of the (n - 1)St order is obtained 
through a conventional back solution. 

This computing routine is allied with well- 
known procedures for evaluating symmetrical de- 
terminants of any order and is similar to Dwyer’s 
(6) method of single division applied to symmetric 
matrices. A presentation of the mathematical 
basis for this variance-covariance method shows 
its relationship to formulas for multivariate cor- 
relation written in the notation of determinants. 
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EXAMPLE 1 


SUCCESSIVE MATRICES OF VARIANCES AND COV .RIANCES 
OF VARIATES IN z-FORM 





Cr(n-1) Cr(n-2) ee Cro 
V(n-1) C(n-1)(n-2) » + + C(n-1)0 


V(n-2) oe a C(n-2)0 


Vo 








V(n-1).n C(n-1)(n-2).n +--+ C(n-1)0.n 


Matrix of First B(n-2)(n-1).n V(n-2). n i: C(n-2)0. n 
Order Residuals ‘ / : 





Bo(n-1).n Vo.n 





V(n-2.(n-1)n cuz C(n-2)0. (n-1)n 
Matrix of Second ‘ : 


Order Residuals 


80(n-2).(n-1)n Vo. (n-1)n 














Matrix of nth 
Order Residuals 
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Aitken (1) has demonstrated that a determinant of order r, a,, # 0, may be evaluated simply by redu- 
cing it to a determinant of order (r - 1) times a factor, thence to a determinant of order (r - 2) times two 
factors, etc. This is achieved by (r - 1) repetitions of the step: 


aii 
aai 


asi 














This may for convenience be expressed 


aii - + «Der 


-Dsr 
where bj, = ajk - 444 - ayk 
aii 








When this reduction process is complete 
p(r) = @,,De20eC33- - -. Irr where bz2 represents cell 1 of determinant 
order (r - 1), C33 cell 1 of determinant or- 
der (r - 2), etc. 
This process of evaluation of a determinant is of immediate interest in DuBois’ method of reduction of 
criterion variance. 
When the original matrix of variances and covariances in z-form is viewed as a determinant, the first 
lower order determinant obtained is comparable to the matrix which forms the first order residuals, viz. 


Vn Cn(n - 1) Cnn - 2) » + + Cno 
Cin - 1)n Vin - 1) Cin - 1)(n- 2)- + - C(n-1)0 


Cin - 2)n C(n - 2)(n - 1) Vin - 2) - + +» Cin - 2)0 





Con * Co(n - 1) Co(n*- 2) - ++ Vo°* 





Following the process of reduction outlined above, it is necessary next to divide column 1 by Vn. Thus 
the element in the first column, first row cell becomes 1 and the other cells in the column become {’s, so 
that: 
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Cnin - 1) Cn(n - 2) 
Bin - 1)n Vin - 1) C(n - 1)(n - 2): 
B(n - 2)n Cin - 2)(n - 1) Vin - 2) 


Bon’ Con - 1) Co(n - 2) 





Vin - 1) ~ A(n - 1)n (nm - 1)n Cin = 1)(n - 2) - 8 n(n - 1) Cnn - 2) * 


Cin - 2)(n - 1) 7 A(n - 2)n©n(n- 1) = Vn = 2) = Alm - 2)n© n(n - 2) c 


Co(n - 1) - 60nEn(n - 1) Co(n - 2) - 80nCn(n - 2) 





Vin - 1).n Cin - 1)(n - 2).n ~ + + Cin -1)0.n 


C(n - 2)(n - 1).n Vin - 2).n toes Cn - 2)0.n 


Cin - 1)0 ~ (n - 1)n[n0 


Vo - 60nC nd 





Co(n - 1).n Coin - 2).n - ++ Von 





Cno 
Cin - 1)0 


Cin - 2)0 





Vo ° 


(n - 2)0 - A(n - 2)nCn0 





This complete determinant is comparable to the triangular matrix of first order residual variances and 
covariances of Example 1. Following the process of reduction outlined above, itis necessary next to di- 


vide column 1 by V(n - 1).n- 
column become ’s, such that 


: VnVin -1).n Cin - 1)(n - 2).n Cin - 1)(n - 3).n 


Bin - 2)(n - 1).n Vin - 2).n Cin - 2)(n - 3).n 





Bo(n - 1).n Coin - 2).n Co(n - 3).n 


Reducing the determinant, 


= VnVin - 1).n Vin - 2).n ~ A(n - 2)(n - 1). nC (n - 1)(n - 2). + + Cin - 2)0.n > A(n - 2)(n 


Coin - 2).n ~ S0(n - 1). no(n - 1)(m - 2). n 





Cin - 3)(n ~ 2).n 7 Bin - 3)(n - 1).n&(n “ie«-Re * « ° Cin - 3)0.n ~ P(n - 3)(n - 


Vo.n - BO(n - 1). n&(n - 1)0.n 


Thus the first column, first row cell becomes 1 and the other cells in the 


where Bi(n - 1).n 
Vin - 1).n 





- 1).n©(n - 1)0.n 


1). n(n - 1)0.n 
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Differences may be expressed in residual variance and covariance notation. 


= VnVi(n - i).n Vin - 2).(n - 1)n 


Cin - 3)(n - 2).(n - 1)n 





Co(n - 2).(n - i)n 


This is the completion of the triangular matrix of 
2nd order residual variances and covariances of 
Example 1. 

Repetition of this reduction process, it is seen, 
will determine the successive higher order resi- 
duals of DuBois’ treatment and perm it final ex- 
pression of the original determinant as a product 
of residual variances, namely, 


pin+!) = VnV(n-1).nV(n-2).(n-1)n . 
Vi.2s...n VO.12...n (1) 
It is noted that use of z-scores ensured that Vp = 1. 


It is also useful to express ues" as a product 


of residual variances and covariances. 
It has been demonstrated that 


pr! . VnV(n-1).n . 
Vo.s4¢ ...n Vi.23...nV0.i2...n (1) 


Now ppt! is obtained from D"*! py omission 


of variates of row0, column 1. Because these 
are the last two variates in the matrix, product 
(1) is unaffected by such omission except for the 
last two residual variances. 

By examination of the determinant from which 
these are obtained, we note 


iti Vin-1).n . 


V2.34...n Vi.o3s...m CQi.23...n 


Coi.23...n VO.23...n 


Omission of row 0, column 1 affects this pro- 
duct only at this stage so, 


n+1 
Doi = VnV(n-1).n- - + Va.3...nCOi.23...0n 
(2) 
Equations 1 and 2 demonstrate that it is possi- 
ble to express determinants inthe notation of var- 


Cin - 2)(n - 3).(n - i)n 


Vin - 3).(n - 1)n 


Co(n - 3).(n - i)n 





C(n - 2)(n - 4). (n - 1)n Cin - 2)0.(n - 1)n 


Cin - 3)(n - 4).(n - 1)n Cin - 3)0.(n - 1)n 


Co(n - 4).(n - 1)n Vo. (n - i)n 





iances and covariances of higher orders. The fol- 
lowing derivations, based upon this translation, 
yield formulas for several coefficients used in 
multivariate co rrelation inboth variance-covari- 
ance and determinant form. 

The notation scheme for the formulas present- 
ed below is as follows: superscripts refer to the 
composition of the determinant with reference to 
its constituent variates, and subscripts indicate 
the row and column eliminated in computing a mi- 
nor. As stated earlier, independent or predictor 
variates are designated 1, 2, 3,...n, and the de- 
pendent variate or criterion is designatea ‘‘0”’. 
The total number of variates may be indicated for 
greater clarity as (n+0), rather than (n+ 1). Any 
sub-grouping of (n + 0) will be designated ‘‘q’’. A 
major determinant is designated Dn+0. If the i-th 
row and the j-th column of Dn+0 are crossed out 
the remaining determinant, Dj\* is called a first 
minor.** A first minor of the type ppt? is a princi- 
pal first minor. Crossing out two rows and two gol- 
umns results ina determinant indicated as pit ii 
Formulas may be written in determinant form 
for the partial variance, partial covariance, betas, 
multiple R, partial r, and varieties of multiple- 
partial and multiple-part correlation. 

It is convenient to express Equation 1 as 


0 
2 VnV(n-1). nV(n-2). (n-1)- 
Vi.23...nV0.,,...n (la) 


and similarly, from Equation 1, we may write 


n+0 
Doo = VnV(n-1). nV¥(n-2). (n-1)n- 


Vi.23...n (3) 


Then, dividing (la) by (3) we obtain: 


pro 


Vo.i2...n = a (4) 
DOO 


in which Vo, i12...n is the residual variance of 0, 
after variance associated with variates 1, 2, 3...n 
has been removed. 

From Equation 2 it is also seen that 
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n+0 
Doi = VnV(n-1).n- - - Ve.s4...nCOi.23...n 


(2a) 


If a principal minor is obtained my evening out 
two rows and two columns from D"*Y rather than 
one row and one column, the resultant expression 
is analogous to (la), so that 


n+0 
Di1,00 = YnV¥(n-1). nV(n-2). (n-1)n- 
V2.34...n 


Then, dividing (3) by (5) we obtain 


n+0 
— Doi 
Co.as...n* sar (6) 
Di, 00 


which is the partial covariance between variates 
0 and 1, after variance associated with variates 
2, 3, 4...n has been removed. 

The beta coefficient is defined as the ratio of a 
partial covariance to a partial variance of the 
same order; in notation, 


Cij. Cij. 

Bij.q7 wt 7 oe (7) 
)-q )-q 

where Cj, q is always equivalent to Cjj,q- There- 


fore, 
co = 
BQ1,23...n = ve on * 


By analogy from Equation 4, 


+0 
Dao 


Vi.as...n = n+0- 
Di, 00 


n+0 
Di; 


+0 
Dit, 00 


It follows from (6), (8), (9) and (9a) that 
n+0 
Bo =o 
1,23.--N = ~niO 
Doo 


VO.23...n = 


(10) 


The coefficient of multiple correlation is de- 
fined as 





Roiia...n)=¥1- Vo.i2...n (11) 


By substitution from Equation 4 we obtain 


p™ 
1 - D0 
00 


RO(.2...n) ° 





The coefficient of partial correlation, rg, 23, . .n, 
may be expressed in terms of partial variances 
and covariances as 


= Ls Coi.23...n 
-28...2 * 
eas VVi.2s...n ¥ YO.23...n —_ 


By substitution from (6), (9) and (9a) we obtain, 
+0 
s i 
ates = 
n+0 n+0 
/ Di; ¥ Doo 





(14) 





A related statistic, the coefficient of part cor- 
relation, (sometimes called ‘‘semi-partial’’ r), 
is useful for relating a dependent variate, 0, to 
some independent variate, 1, after variance in 1 
associated with one or more additional or control 
variates has been removed from 1, but not, of 
course, from 0. For example, in contrast to par-., 
tial correlation where we are correlating two res- 
iduals, Z0.23...n With Z,;,.23...n, in part corre- 
lation we are correlating zg with Z,.23...n- The 
formula for part correlation may be written 


Coi.23... 
TQ(1.23...n) a (15) 
4.233... 


Substituting from (6) and (9) we obtain 


n+0 
Doi 





Fo(1.23...n) = 
+0 
Doo 


+0 
Di; 00 


n+0 
Dii,00 


and then simplifying, 
+0 
To(i.23 .* roi (16) 
el n+0 n+0— 
/ Di1,00 Poo 





or analogously, from (6) and (9a), 


n+0 
Do 


+0 n+0 
Dii,00 Pi 





(16a) 


¥:(0. 23...n) * ] 


In some instances it is desired to partial out 
the variance associated witha number of variates, 
q, from the remaining variates. Treating one of 
these resulting residual variates as the criterion 
the multiple correlation between it and the resi- 
dual predictor variates may be obtained. This 
may be expressed as the multiple-partial R, as 
described by Cowden (2), which in the notation 
used by DuBois (4) is 
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Vo..2...n 


R(0. q){ 1.q, 2.q,... (n-q). q] | ” 


(17) 


Again substituting, we obtain in determinant form 
the equation for the multiple-partial correlation: 


R(O. q)[ 1.4, 2.q,...(n-q).q] = 





pro pat? 


pee 
00 


If the q variates are partialled from the (n - q) 
predictors but not from the criterion, 0, we may 
obtain the multiple-part correlation between 
(n-q) modified independent variates and an unmod- 
ified criterion. This coefficient is written 





Ro[ 1.q,2.q,...(n-q).q] = ¥ YO.q - YO.12...n 
(19) 


In determinant form this multiple-part correla- 

tion is 

p10 pn+0 

DEO pao (0) 
00 


Roy i.q,2.q,...(n-q).q] ~ 
Vv 00 


Conversely, the multiple-part R 9: the tyve where 
we wish to dbtain the correlation between (n-q) 
‘inm :difieu independent variates and a residual 
criterion, (0.q), may also be cemonstrate.. For 
simplicity, the criterion (0.q.) is representeu 
as 0'; where 


20. q 


0.q-=0 = 
%0.q 


Then 





Rio. q)[ 12...(n-q)] = ¥1 - Yor. 12... (n-q) 
(21) 


In determinant form, by analogy from (4) we know 
that, 


pao’ 
Vor.12...(n-q) = —qe0T (22) 
Do 0’ 


Then, by substituting, 





Rio. q)[ 12... (n-q)] = 


From the earliest years of multivariate corre- 
lation determinants have been used as a means of 
expressing pertinent numerical operations. Pear- 
son (14) explicitly discusses the use of determin- 
ants in an article published in 1903. Except for 
minor differences in the notational scheme, our 
formulas for multiple and partial correlation and 
for regression coefficients are the same as those 
presented presented by a number of authors (9,12, 
13). From the practical point of view it is appar- 
ent that if one begins with a matrix ofcorrelations, 
the computation of these several coefficients by 
solving determinants with a desk calculator is un- 
economical. The variance-covariance formulation 
is more direct and permits retentionof statistical 
meaning at every step in the computations. Onthe 
other hand, where electronic computer programs 
have already been developeu for evaluating deter- 
minants, there is ample justificationfor using the 
determinantal formulas as an alternate approach. 


FOOTNOTES 


* Prepareu in part under Contract Nonr (816(02) 
between the Office of Naval Research and Wash- 
ington University. Opinions expressed are 
those of the authors and are nottobe construed 
as representing the endorsement of the Depart- 
ment of the Navy. 


**In evaluating minors of the type Djj itis custom- 
ary to take into account the sign attached to the 
ij-th position; in this instance the signed minor 
may be referred to as the cofactor Ajj. The 
sign of the ij-th.position may be determinea by 
a formula such as that described by Kelley (13): 


Ajj = (-1)"")) Dj; 


However, the format usea to evaluate first mi- 
nors by the variance-covariance procedure is 

such that in the arrangement of the matrix, the 

i-th and j-th variates are always adjacent at 

the extreme right of the table. Hence, the 

quantity (-1)(+)) is invariably negative. 
Throughout this paper we have preferred to pre- 
sent formulas in the notation Djj rather than 

Ajj; otherwise the sign attached to the ij-th po- 
sition must be taken into account. 
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AN EXPERIMENTAL EVALUATION OF TWO 
DIFFERENT PROGRAMS OF TEACHING 
HEALTH IN THE SIXTH GRADE AND 
THE ADMINISTRATIVE IMPLI- 
CATIONS INVOLVED 


ARTHUR M. JENSEN 
Tuttle Elementary Demonstration School 
Minneapolis, Minnesota 


Problem and Its Significance 





THE IMPORTANCE of problems confronting 
the administrator and his staff is determined by 
the interpretations of measures of effectiveness 
for the particular situation under consideration. 
Moreover, the urgency with which a staff views 
the need for study of what is happening in the 
learning situation, will largely determine the ex- 
tent of participation and involvement. The study 
reported here is an examination of how such prob- 
lems can be effectively worked out in a school. 
People in health education have had a very active 
interest in studying the nature of the instructional 
program by which they might best achieve the ob- 
jectives of health education. If one is to evaluate 
the effectiveness of any program which is de- 
signed to improve on existing programs or if two 
ormore new programs are under consideration, 
the method which can give the most precise meas- 
urement is that of the modern experiment. 

The study which we shall consider here is a 
comparative experiment to determine the relative 
merits of two programs which have a common 
function. To make a comparison between the two 
methods, this investigation was designed to exam - 
ine the achievement toward the objectives of health 
education in the sixth grade in the elementary 
schools. 


Distinctive Features of Treatments 





Two treatments of curriculum content were em- 
ployed in this study. The procedure for arrang- 
ing the content c onsisted of organizing the objec- 
tives of health education in a manner that would 
assure the children that the curriculum for the 





sixth grade would be taught with the best possible 

instruction by either treatment. One treatment of 

the content employed the unit organization of sub- 
ject matter and included teacher-pupil pl anning 

techniques and the problem-solving approach to 

learning. The other treatment followed the com- 
monly used method of integrating the curricular 

content for health education into the basic sub- 
ject areas, such as social studies, science, read- 
ing, arithmetic, etc., together with topical ar- 
rangement of content where integration did not ap- 
pear to be feasible. These two treatments of sub- 
ject matter were carefully described and followed 

by both teachers during the two-year periodof the 

experiment. Each teacher kept a log of activities 
for the first year. These logs were exchanged the 
second year to guide each teacher inthe use of the 
alternate treatment. 


The Situation for the Experiment 





This experiment was primarily concerned 
with the pupils in the sixth grade at the Tuttle Ele- 
mentary School which is a regular public school 
operated under the rules and regulations of the 
Minneapolis Board of Education. It is a school in 
a middle-class neighborhood in an older partof 
the city. Since it is in the proximity of the Uni- 
versity of Minnesota, it frequently provides oppor- 
tunities for students to observe demonstrations in 
teaching techniques. Theenrollment ranges from 
550 to 600 in grades kindergarten through six. 
Records indicate that the average ability of the chil - 
dren over several years was between 98 I.Q. and 
102 1.Q. There were 15 regularclassroom teach- 
ers employed at the time of this study. The prin- 
cipal serves two schools, this one three days per 


This report is based on: Arthur M. Jensen, An Experimental Evaluation of Two Different Programs 





of Teaching Health in the Sixth Grade and the Administrative Implications Involved, unpublished Ph.D. 





dissertation, University of Minnesota, 1958. Dr. Palmer O. Johnson and Dr. Otto E. Domian, Co- 
advisors. 


**Palmer O. Johnson. Statistical Methods in Research (New York: Prentice Hall, 1949), pp. 109-202. 
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week and a nearby school two days per week. 
There are part-time certificated personnel which 
include a nurse, a visitingteacher, a physical ed- 
ucation teacher, and a speech teacher. Consult- 
ants are available on call in art, physical educa- 
tion, music, science, and general curriculum. 
In addition, there are three custodians and a full- 
time clerk serving theschool. Aschool situation 
such as described would represent any of a num- 
ber of schools in Minneapolis which are com par- 
able in size and where the conditions may approx- 
imate those found in this school and community. 
This does not, however, permit occ asion to gen- 
eralize the results beyond the evidence found in 
the situation except by implication. 


Population and Sample 





The population studied is that of an aggregate 
of all sixth-grade pupils in the Tuttle School over 
an undefined period of time. Basically, the as- 
sumption was that this population will persist un- 
til such time that the socio-economic and other 
related factors change the constituents of the pop- 
ulation. The assumption made was that the class- 
es for the particular years used in this study could 
have come from this population. A study of the 
means and variances of the population over a four- 
year period of time from 1953 through 1957 showed 
that the classes were comparable at the 5 percent 
level of significance with respect to intelligence 
quotients. This was determined by testing the 
homogeneity of the variances and the equality of 
the means. The L, test was used to test the dif- 
ferences between the variances and the analysis of 
variance was used to test the differences bet ween 
the means. The samples or classes usedin the ex- 
periment were shown to be representative of this 
population which was comprised of the sixth-grade 
pupils forthe years, 1955-56 and 1956-57. A total 
of about 60 pupils was selectedfor each year com- 
prising a total of about 120 cases. In the final out- 
come, a total of 96 pupils was used due to late en- 


trants anddropouts. There were four classes used, . 


two inany givenyear. Assignment tothese classes 
was made at random. 


Experimental Design 





From the standpoint of experimental designs, 
we may designate this study as unrestricted ran- 
domization. The pupils were assigned to the 
classes, each of which was taught by a different 
method. This method of selection toclasses pro- 
vided a means whereby every pupil in the sixth 
grade for the years 1955 through 1957, at Tuttle 
School, had an equal chance of being in a particu- 
lar class with a particular program which we have 
referred to as treatments. Each year of the two- 
year period approximately 60 students were ran- 
domly placed into two classes of equal size. The 
process of assigning pupils at random to one or 





the other treatment was the standard one using 
random-sampling numbers. Briefly, this consist- 
ed of giving every sixth-grade pupil com prising 
the experiment a number and then entering a table 
of random-sampling numbers in the accepted man- 
ner and placing each pupil into one or the other 
sixth-grade class.** A coin flip determined the 
teaching treatment for eachclass, the secondclass 
received the alternative treatment with the restric- 
tion that during the second year of the experiment 
the teachers were assigned to the alternative treat- 
ment. 

Preliminary to the three-way analysis of the 
factors of teacher, treatment, ability level and 
their interactions, the absolute gains in achieve- 
ment were analyzed. This was done by the use of 
the ‘‘t’’-test for the significance of the difference 
between the means on the pre- and post-tests. 
Likewise, the variances were used for testing the 
significance of the difference in variability on the 
pre- and post-tests. The appropriate ‘‘t’’-tests 
for correlated data were used in both cases. 

The design was a 3 X 2 X 2arrangementof data 
involving teacher, treatment and student ability 
level. The analysis of variance and covariance 
were utilized in the analysis of the data. Both the 
absolute gains and the relative gains in achieve- 
ment were studied in the analysis of the data. 

The inventories of attitudes were studied pri 
marily for the changes in attitudes that children 
may have undergone from pre- to post-testing and 
to determine whether or not the children respond- 
ed differently from the teachers’ expected respon- 
ses. The Chi-squaretest of independence was ap- 
plied for testing out differences in attitudes be- 
tween treatments which may have existed. The re- 
sults of the inventory of attitudes administered to 
parents were treated in a similar manner. 


Evaluation Instruments 





One achievement test for health knowledge con- 
structed by the writer and a commercial test were 
used to measure achievement. Careful attention 
was given to item analysis, defining validity, test- 
ing out reliability, and preliminary testing for the 
constructed test. In addition, two inventories of 
attitudes were used for the purposes discussed 
previously. 


Instructional Methods 





An entire year’s work based on the health cur- 
riculum for the sixthgradeinthe Minneapolis Pub- 
lic Schools was planned and taught by two regular 
teachers. The curriculum was based upon the 
scope and sequence of the content found in an ex- 
amination of the children’s textbooks on health and 
the health education guide furnished by the schools, 
together with research recommendations. This 
procedure resulted in organizing the content into 
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nine broad categories of health knowledge and at- 
titudes for identification. A basic assumption 
was made that the experimental class which used 
the unit arrangement of subject matter and teach- 
ing technique would include the same year’s cur- 
riculum content as the control class. This class 
had the content integrated into other subject areas 
and was taught by the method generally used in 
the schools. 

The general plan for unit development was uti- 
lized by the experimental group throughout the 
study. A well-established procedure followed by 
the schools was recommended. This plan is often 
analyzed into several stages of development, for 
example: introduction, exploration, problem -set- 
ting, problem-solving, summarizing and evaluat- 
ing. Each unit that was selected was developed 
by teacher-pupil planning techniques. The con- 
trasting or control method was primarily that of 
integrating health education into the basic areas 
of the curriculum. This is frequently supplement- 
ed by topical arrangement of material not easily 
integrated. It likewise includes timely topics and 
teachable moments when events or experiences 
arise that need immediate attention. However, 
experience has demonstrated that mutually exclu- 
sive methods of teaching are hard to find. A ser- 
ious attempt was made by the participating teach- 
ers to constantly evaluate their work in terms of 
the type of teaching that they were doing in ov 
not to introduce bias into the experiment. 


Analysis of the Experimental Data 





An over-all description of the experimental re- 
sults for the Health Knowledge and Attitude Test 
is observable in Table I. Thissummary table for 
the means and standard deviations shows that the 
results on the means followed a uniform pattern, 
that is, the means increased between the pre- and 
post-tests. This same table shows that a reduc- 
tion in the standard deviations occ urred between 
the pre- and post-tests. The main analysis was 
concerned with the comparison between treat- 
ments. An analysis of the experimental data was 
made first to determine if there had beena signif- 
icant growth under each of the treatments. The 
difference between the means on the pre-test and 
the post-test was tested for statistical signifi- 
cance by the application of the appropriate ‘‘t’’- 
test of significance for correlateddata. An appro- 
priate ‘‘t’’-test on the significance of the differ- 
ence between the variances was also made on the 
pre- and post-tests. 

The basic data and results of the test of signif- 
icance for the Health Knowledge and Attitudes 
Test are found in Table II. Table II reveals that 
the sixth-grade students in each of the four class- 
es were consistent in making gains. The gain in 
mean performance rangedfrom 8.79 to 12. 54 and 
all gains were statistically significant. 





The next comparison of pre-test and post-test 
performance scores involved the variances. Table 
III contains the initial and final variances and the 
test of significance. 

It is observed from Table III that inthree of the 
four classes there was a reduction in variability 
within classes. 

The second and main analysis concerns the de- 
termination of the differential effects of the con- 
trasting treatments. Using the three-way classifi- 
cation system for organization of the data provid- 
ed an analysis giving meaningful interpretation re- 
garding treatments, teachers and mental ability. 
This also made it possible to study the several in- 
teraction effects. Shown in Table IV are the var- 
ious effects set forth and the number of degrees of 
freedom assessed to each. In addition, this table 
shows the interactions that were studied. 

This tabular arrangement illustrates in concise 
form the basis for testing out the various hypothe- 
ses of the effects of teacher, treatment, ability 
and interactions. 

The analysis of the means on the Heal th Know- 
ledge and Attitudes Test was accomplished by the 
analysis of variance technique. There were signif- 
icant differences am ong the three I.Q. groups as 
shown in Table V. This motivatedthe study of the 
adjusted means. 

The relationship between ability and achieve- 
ment for three of the four classes on initial testing 
indicated that the means of the highest I.Q. level 
were above those of the middle I.Q. level and the 
means of the middle I.Q. level were higher than 
the means of the lowest I.Q. level. The exception 
occurred with the integrated group with Teacher 
(1) where the initial mean was higher for the mid- 
die I.Q. group than for the highestI. Q. group. 
This situation was occasionto re-analyze the final 
test scores by adjusting for the initial differences 
by the analysis of covariance. Table VI contains 
this analysis. 

An examination of Table VI shows that no signif- 
icant differences were found between the mean 
achievement under the twotreatments, nor were 
any of the interactions found to be statistically sig- 
nificant. There was, however, a significant differ- 
ence among the means of the students of different 
levels of ability. 


Investigation of Health Attitudes 





The Chi-square test of significance of independ- 
ence of change in responses to items from pre- 
to post-testing was utilized for the student and par- 


ent inventories. Pupil responses were compared 
to responses selected by the teacher by use of the 
Chi-square test of independence. The pre- and 
post-responses by treatment were also compared. 

The results on the inventories of attitudes which 
were specifically prepared for the study indicated 
that: 
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TABLE I 


MEANS AND STANDARD DEVIATIONS FOR THE FOUR CLASSES BY TEACHER 
AND TREATMENT FOR THE HEALTH KNOWLEDGE AND ATTITUDE TEST 





Teacher (1) Teacher (2) 





Method Method 
Integrated Unit Integrated 








Pre 77. 46 72.37 79.21 
Post 87.29 84. 91 88.00 


Difference 9.83 12. 54 8.79 





Standard Deviations 
Pre 14.74 19.09 13.05 


Post 10. 34 12. 96 13. 66 


Difference 4.40 6. 03 .61* 





* The only Standard Deviation that increased. All the others decreased. 


TABLE II . 


TEST OF THE SIGNIFICANCE OF THE MEAN GAIN FOR EACH CLASS 
ON THE HEALTH KNOWLEDGE AND ATTITUDES TEST 





Standard 
Difference Error of 
Between Difference 
Year Group Means in Means .F. Conclude* 





First Integrated 9.83 1.75 
First Unit 10. 63 1.70 
Second Integrated 8.79 2.31 


Second Unit 12. 54 2. 64 4.75 





*S = Significant gain (. 05 level); all significant at .001 level. 
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TABLE Ill 


VARIANCES AT THE BEGINNING AND END OF THE SIXTH GRADE AND THE 
SIGNIFICANCE OF THEIR DIFFERENCE FOR HEALTH KNOWLEDGE 
AND ATTITUDES TEST 





Year Group S? pre S* post Difference D.F. Conclude* 





First Integrated 217. 30 107.08 -110. 22 22 
First Unit 349. 26 292.25 - 57.01 ‘ 22 
Second Integrated 170. 35 190. 87 20. 52 : 22 


Second Unit 364. 77 168.17 196. 60 22 





* NS = Not significant. S = Significant. 


TABLE IV 


SOURCES OF VARIATION AND DEGREES OF FREEDOM 








Source of Variation Degrees of Freedom 





Treatments 

Teachers 

Ability levels (I. Q. ) 

Treatment X Teacher * 

Treatment x I. Q. 

Teacher XI. Q. 

Method x Teacher x I. Q. 2 


Residual or error 84 





Total 95 
* The treatment x teacher interaction is confounded with year 
differences. 
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TABLE V 


MEAN SCORES ON THE HEALTH KNOWLEDGE AND ATTITUDES 
TEST FOR EACH OF THE I.Q. GROUPS AT THE END OF 
THE EXPERIMENT 





I.Q. Group Mean Scores 





Upper third 93. 66 


Middle third 88.97 


Lower third 75.97 





TABLE VI 


ANALYSIS OF MEANS ON HEALTH KNOWLEDGE AND ATTITUDES TEST AT 
END OF EXPERIMENT ADJUSTING FOR INITIAL SCORES 





Sums of 
Squares Mean 
Source of Variation D. F. Adjusted Squares Conclude* 





Treatment . 30 . 30 NS 
Teacher . 30 13. 30 NS 
I.Q. .18 . 09 . Ss 
Treatment X Teacher 5.13 .13 NS 
Treatment x I.Q. 2 23.18 . 59 NS 
Teacher X I.Q. 2 77. 94 .97 NS 
Treatment X Teacher xI.Q. 2 42.92 . 46 NS 


Residual 83 6,096. 41 -45 





Total 94 
* NS = Not significant. S = Significant. 
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. Changes in responses to items on the Health 
Attitude Inventory were significant for the 
second year of the experiment at the 5 per- 
cent level. 


. When comparison was made of selected 
items by treatment effects, only one item 
was significant at the 5 percent level. 


. The results were not significantly different 
when the pupil responses were compared to 
those deemed desirable by the teachers. 


. Pre- and post-responses compared by treat- 
ment were found to be statistically signifi- 
cant. 


. When responses from pre to poston the par- 
ents’ Inventory of Health Practices of Chil- 
dren were studied, there was a change in 
attitudes at the end of the study as com- 
pared to those at the beginning. There 
was also a statistically significant differ- 
ence between treatments. 


Summary 


The main conclusions based on the evidence 
presented in the reported study are: 


1. The effects in promoting achievement of the 
objectives of health education showed a sig- 
nificant gain on the means for each _ treat- 
ment. 


. A comparison of the two treatments did not 
reveal statistically significant differences 
among the classes studied. 


. There were no Statistically significant dif- 
ferences shown between teachers. Thecon- 
clusion drawn is that the teachers were 
equally effective in using the integrated and 
unit arrangements. 


. Achievement among theI.Q. groups differed 
significantly; that is, the superior students 
surpassed the average and the average sur- 
passed the inferior. 


. As far the observable outcomes were con- 
cerned the experimental evidence indicated 
no significant results with respect to differ- 
ences between teachers and between treat- 
ments and their interactions. 


. Evidence did indicate that some significant 
changes in attitudes among children did take 
place. This was also noted for the parents. 


Implications for Administration 








One purpose advanced for this study pertained 
to the use of the results for administrative consid- 
erations regarding health education. There were 
several things that may be impliedfrom the results. 
In the first place, it was shown that the involvement 
of teachers in a long study resulted in a real chal- 
lenge and desire on the part of the teachers to ex- 
amine their purposes and practices more objec- 
tively. A second outcome that became apparent 
as the study progressed was the inc reased inter- 
est the teachers had inthe procurement of supplies, 
equipment, and related teaching materials for both 
treatments. Since the results regarding superior- 
ity of one treatment compared to another were not 
conclusive, it would not be expedient to adopt one 
or the other without more objective evidence. Re- 
sults from this experiment may suggest some as- 
surance that either the unit treatment or the inte- 
grated treatment could be employed without loss 
to the students in health education. As long as 
either method is taught at its best, the adminis- 
trator could feel a degree of confidence that the ob- 
jectives are being fulfilled. A third implication 
that may be inferred from the study relative to the 
cost for the treatments is thatthereis not enough 
evidence to show which treatment would be less 
expensive than the other if both achieve the same 
objectives satisfactorily. 

There are so many possible variations by both 
treatments that the evidence suggests a possible 
combination of both treatments used in this study. 
This appears to hold the promise of utilizing unit 
arrangement for those phases of health education 
that can be organized in that manner and also em- 
ploying an integrated approach to materials that do 
not apply to any organized pattern. 

A general implication that may assist in the ad- 
ministration of the health program resulting from 
the examination of the data, is that neither teach- 
er, methodor ability levels operate in isolation. It 
is the composite effects of these and other influ- 
ences that encourage learning in health education 
in meeting the psychological, sociological, physi- 
ological, pedagogical and democratic objectives 
of education. 


Implications for Further Research in 
Health Education 








In the area of health education this study should 
have some meaning for future research. It may 
suggest that further investigation would follow up 
in greater detail, the observations of the behavior 
of children in a health education program conduct- 
ed under contrasting treatments. Also, it may 
suggest that the factor of retention of information, 
attitudes, and other related learning experiences 
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would present a challenging inquiry, particularly 
in the normal behavior of health. A follow-up 
study of what pupils say they docompared to their 
observed reactions may be implied as a likely 


study in health education. These and other possi- 
ble directions for future research also suggest 
that the instruments for evaluation need to be fur- 
ther developed to serve these purposes. 
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THE RELATIONSHIP OF STUDY HABITS 
AND OTHER MEASURES TO ACHIEVE- 
MENT IN NINTH-GRADE 
GENERAL SCIENCE 


DANIEL P. NORTON 
Hibbing High School 
Hibbing, Minnesota 


Problem 


UNDER THE apparent assumption that certain 
mechanical procedures are significant contribu- 
tors to achievement in the various fields of learn- 
ing, alarge amount of effort has been directed 
toward identifying the procedures which correlate 
most highly withachievement. It was the purpose 
of this study to re-define studiousness in terms 
which would permit application of multiple regres- 
sion analysis to the question, ‘‘Does achievement 
in ninth-grade general science relate more close- 
ly to study habits than intelligence, reading abil- 
ity, and aptitudes?’’ 


Population and Sample 





During the school year 1957-58 five general 
science sections inthe High School Building of the 
Hibbing, Minnesota, schools were available for 
daily observation and constituted the population 
under study. No selective procedures were fol- 
lowed in making class assignments beyond those 


necessitated by curricular organization. From 

these classes it was possible to secure samples 

of 41 boys and 53 girls who had received training 

from the same instructors in seventh- and eighth- 
grade general science, and whose previous test- 
ing programs had been identical. 


Measuring Instruments and Techniques 





The study incorporated six independent vari- 
ables and one dependent. The independent vari- 
ables were: 


Iowa Silent Reading 
Iowa Algebra Aptitude 
Otis Quick Scoring 
Student Rating 
Instructor Rating 
Differential Aptitudes 
Verbal Reasoning 
Abstract Reasoning 





Space Relations 
Mechanical Reasoning 


The first three measurements had been secured 
previously; the raw scores were used. Student 
Rating of study habits and application was secured 
near the conclusion of the course. Each student 
in the sample sections was rated by five other stu- 
dents assigned to the same section in accordance 
with a ratiug scale and instructions developed by 
the instructor. The problems involved in develop- 
ment of a useful Student Rating were (a) selection 
of an appropriate scale, (b) securing an ‘‘honest’’ 
rating quickly, thereby preventing consideration 
of friendships or knowledge of achievement, either 
of which might have influenced results, and (c) min- 
imizing the complications of effort involved by 
identifying the smallest number of such ratings 
necessary for each student to acquire a meaning- 
ful composite. 

Each student rated and was rated five times. 
Ratings were conducted by random assignment 
within sections and followed an accurately tim ed 
schedule that provided fifteen seconds to make the 
first rating requested by the somewhat unfamiliar 
scale. Thirty seconds were allowed to complete 
the other four ratings by the form which follows: 

4 j BEST By the scale at 
the left, I rate 
3 the study habits 
and application 
2 AVERAGE of 


1 Jane Smith 





0 POOREST as 


The five ratings each student received were 
summed in Table I. Reliability of the method was 
untested except by inspection. A consistency of 
rating seemed apparent. While it would have 
been desirable to have no summed ratings of 
twenty, adequacy of rating sample size seems as- 





JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE I 


DISTRIBUTION OF SUM OF STUDENT RATINGS AND INSTRUC TOR 
RATINGS BY SEX 





Student Instructor 








Boys Girls Boys Girls 





INK WRWNNUWRAHK RK QraAe 
i PRE NWF NWR WWW Oh DY 


1 
2 
1 
1 
1 
2 
2 
5 
8 
3 
3 
5 
2 
2 
1 
2 


iNYVNeKPWWwWrRUIANKH RK PNR! 


COrNnNwWH OD -10 0 


_ 
— 








TABLE IU 


STATISTIC AL SUMMARY FOR STEP ONE BY SEX 





Statistic Boys 





- 4146 . 2642 
. 8293 . 4528 
. 3415 . 1736 
. 8293 . 6981 
. 0000 . 2642 
. 6585 . 2311 


. 5610 . 1868 


. 5814 . 5927 
. 3915 . 6448 
. 5533 . 7061 
. 5994 . 6304 
. 3048 . 3761 
. 5595 . 7638 


. 5402 . 5690 
. 7030 . 1477 
- 4012 . 6142 
. 2820 . 4188 
. 5195 . 6403 
. 6664 . 6117 
. 3740 . 7278 
. 4475 . 5157 
. 4539 . 5981 
- 4889 . 6399 
. 2239 . 3522 
. 5753 . 7149 
. 5173 . 7395 
. 2081 . 6449 
. 4490 . 3346 


. 14878349 . 17503628 


n 
Nnw rh 


. 74512195 . 48331000 


n 


wn 
—@ nw On UNM FD Wh 


63. 13048785 . 15544267 


wn 


14. 79512195 . 90711176 


20. 35000000 . 65965418 


n 


58. 55236280 . 34819485 


Dn 


101. 85248113 . 96371917 
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sured, 

The same numerical values were utilized in In- 
structor Rating as Student Rating. Five such 
ratings were made at approximately monthly inter- 
vals, one on each of the days of the school week. 
Times selected were made to coincide with major 
tests so that each student would be visible for en- 
hanced recall of behavior, and interruption would 
be least apt to occur. Instructor ratings differed 
most from Student Ratings in that they were meant 
to evaluate not what might be called an ‘‘impres- 
sion’’, but an accumulation of evidence and im- 
pressions. Notes were made in the class grade 
book when a student (a) did not complete an assign- 
ment, (b) had not promptly beg un an assignment 
when it was made, (c) seemed to daydream, etc. 
These notations were reviewed when ratings were 
made. 

As was the case with Student Ratings, the dis- 
tribution for boys is somewhat normal to inspec- 
tion; that for the girls is bimodal. There is also 
a noticeable skewness to the upper range for 
which a possible explanation is not hard to find. 
It was the unsolicited opinion of most instructors 
to this class that they were unusually studious. 
This was purportedly most true of thegirls. The 
author concurs. 

The Differential Aptitude Tests were adminis- 
tered at the start of the school year. The batter- 
ies used were selected for probable lack of over- 
lapping with other measures Z-scores were 
summed. Raw scores were used for the first 
three measures and achievement. 

The dependent variable, General Science 
Achievement (Y), was measured by an objective 
test administered at the close of formal class in- 
struction. For outcomes of instruction desired 
in the Hibbing schools, validation by inspection 
was requested of other instructors, who consid- 
ered it good. Validation by scores was measured 
by correlation with those obtained on the Coopera- 
tive General Science Test administered two days 
later. Coefficients for boys and girls were .788 
and .824, respectively. 

Split-half reliability was calculated for odd 
and even items. For boys the absolute and rela- 
tive reliabilities were 4.021 and .770, respec- 
tively; for girls they were 3.607 and .917. 


Analysis 


Analysis proceeded separately for boys and 
girls and followd the method outlined by Johnson.! 
Intercorrelations and standard deviations were 








calculated.? Zero order correlations between the 
independent variables and dependent variables 
were ail positive, the lowestfor each group being 
l5y (Table II). 

In step two, Fisher’s auxiliary statistics (gjj)’s 
were calculated for the six systems of simultane- 
ous equations, after which it was possible to com- 
pute Ry 123456, the multiple correlation between 
Y and Xj, X9, ..., X6. 

For step three, define: 


Bi = 2 8ij iy ey See 


where Bj is the standard partial regression coeffi- 
cient. 
Define again: 


Ry, 123456 = > Biriy (i= 1,..-,6) 


The standard partial regression coefficients 
are recorded in Table III and the multiple correla- 
tions in Table IV. The latter may be referred to 
as very high indicating a strong relationship to be 
present. 

In step four the significance of Ry, 123456 was 
tested by means of the variance ratio, which in 
this case is a ratio of the mean square associated 
with regression. 


R?(N - m - 1) 
m po ’ 


F (variance ratio) = 


where m, is the number of degrees of freedom as- 
sociated with regression (in this case six), and 
other symbols are as used previously. The re- 
sults were statistically significant (Table V). 

The significance of each Bj was tested in step 
five. Define: 





(1 - R¥. 123456) 8ii 
N-m-1 





“Bi “y 


where SB is the standard error of Bj. The test 


of significance of each Bj is given by: 


t 
B:; 
i SB: 
with N - m - 1 degrees of freedom. 
For both sexes, Bg, calculated from Differen- 
tial Aptitude Test data, was significant at the one 


1. Palmer O. Johnson. Statistical Methods in Research (New York: Prentice-Hall, Inc., 1949). 





2. The number of decimals carried seems an absolute minimum for the method used and the number of 


variables involved. 





TABLE Il 


STANDARD PARTIAL REGRESSION COEFFICIENTS (Bj)’s BY SEX 





B, Bo B3 By Bs Bg 





. 2581 . 0496 -. 1990 . 6220 -. 3027 . 5237 


-. 0300 . 2017 . 2524 . 0342 . 0189 - 4526 





TABLE IV 


MULTIPLE CORRELATION BETWEEN THE DEPENDENT 
VARIABLE (Y) AND THE INDEPENDENT VARIABLES 
(X;)’s BY SEX 





2 
Ry. 123456 Ry. 123456 





. 63294503 . 71956 


. 66562245 . 8159 





TABLE V 


PROBABILITIES ASSOCIATED WITH THE OBSERVED 
VARIANCE RATIOS BY SEX 





Variance Probability 
Ratio (F) (P) 





9. 7696 <.01 


15. 2600 <.01 








JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VI 


PROBABILITIES ASSOCIATED WITH THE STANDARD PARTIAL REGRESSION 
COEFFICIENTS (Bj)’s AS DERIVED BY ‘‘t’’- TEST 





Boys Girls 








‘Bi Probability tBi Probability 





-10 >p >. 05 .90>p>. 
.80 >p >. 90 1. 542 .20>p>. 
.40 >p >. 30 .20>p>. 
.001 >p .90 >p>. 
.10 >p>.05 .90 >p>. 
.01 >p >.001 .01 >p>. 





TABLE VII 


PROBABILITIES ASSOCIATED WITH THE DIFFERENCE BE- 
TWEEN STANDARD PARTIAL REGRESSION COEF FI- 
CIENTS FOR BOYS AND GIRLS AS DERIVED 
BY ‘‘t’’- TEST 





tB4-B5 Probability 





6. 624 001 > p 


0.110 p >.90 
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percent level; for the boys, Bg, Student Rating, 
was significant at the one-tenth percent level. 
Other Bj’s were ‘‘large’’ but not enough to permit 
assignment of significance at the five percent lev- 
el usually considered minimal. The patterns for 
boys and girls were noticeably different (Table VI). 

Two terminal tests were then made. First, it 
was considered valuable to know whether Student 
Ratings and Instructor Ratings differed significant- 
ly from each other. Stated in the null, the hypoth- 
esis may be symbolized: Ho: B4 = B5, for each 
sex (alternate: Bq4> B5). A ‘‘t’’ test may again 
be made in which: 


Bg - Bs 
SRes¥ 844 ~ 2845 - 855 





s a 1-R 
Res ~ N-m-1il 


The results indicate a highly significant differ- 
ence in the ratings for the boys only (Table VII). 
The hypothesis is accepted for girls, rejected for 
boys. 

Second, the proportions of the total variance 
‘of scores on the achievement test accounted for 
by the linear combination Xj, X2,..., Xg, had 
been found to be 0.633 and 0.666 for boys and 
girls, respectively. With only Bg significant at 
the prescribed level for girls no fur ther calcula- 
tions were made. For boys, however, witha 
sample of forty-one, Bj, B4, B5 and Bg may be 
taken as significant to a greater or lesser extent. 
Additional calculations were elected for them. 

The percentage of association of each of these 
independent variables with the variance of the de- 
pendent variable is given by the square of the 
standard partial regression coefficients. Their 
sum may be treated as the total (100%) of the var- 
iance accounted for by the set of four factors. 
The proportion accounted for by the respective 
factors may be easily determined by division and 
reveals 47 percent associated with Student Ratings 
and 33 percent with Differential Aptitude Test 
scores. 


Conclusions 


1. This investigation did not find study habits, 
as measured by Instructor Rating, more closely 
associated with achievement in ninth-grade gen- 
eral science than intelligence, reading ability 
and aptitudes. When measured by Student Rating, 





it was more closely associated for boys. 

“2. As rated by other students, the study habits 
of boys was a statistically significant predictor of 
science achievement; as rated by the instructor, 
their study habits neared significance negatively. 

3. The difference between Instructor Rating 
and Student Rating of study habits of boys was sig- 
nificant beyond the one percent level; the diffe r- 
ence was not significant for girls. 

4. Aptitudes, as measured by the Differential 
Aptitude Tests, were the most significant predic- 
to for both sexes considered together. 

5. Instructor Rating appeared less valuable for 
predictive purposes than any other independent 
variable. 


Summary and Recommendations 





The study required development of a technique 
for measuring study habits or application of stu- 
dents and subsequent multiple regression analysis 
of six independent variables and one dependent var- 
iable. Particular weaknesses were the doubtful 
nature of what students actually meant by their ra- 
tings and limited precision of Instructor Rating. 
They were offset to an extent by the fact that cor- 
relations secured by Student Rating were as high 
or higher than those customarily found by inven- 
tory methods. Alsoimportant is the frequent 
agreement with results obtained in previous re- 
search. 

While the more mechanical aspects of learning 
are easiest to note, this does not preclude that 
they may be more extraneous than basic. Learn- 
ing may proceed more from the less tangible fac- 
tors such as attitude and aptitude than is common- 
ly conceded. If so, the researcher might do well 
to review his mental sets with respect to study 
habits. Perhaps he should investigate the thought 
patterns involved more thoroughly. 

What difference exists between the learning 
processes of girls and boys? Surely they may dif- 
fer in a significant manner. If research to date 
has passed by and failed to identify the factor or 
factors more basic to science achievement, addi- 
tional effort should be expended on that behalf. 
Further, if the achievement patterns of boys are 
typically underestimated as a result of ‘‘study 
habit’’ considerations, care should be exercised 
in their favor. This would be particularly true in 
guidance departments where aptitude test scores 
might predict achievement better than has been 
real ized. 
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STATE LIMITATIONS ON LOCAL PUBLIC 
SCHOOL EXPENDITURES IN THE 
UNITED STATES 


HARRY E. HULS 
South Dakota State College 
Brookings, South Dakota 


Statement of the Problem 





THE PROBLEM of state limitations on local 
public school expenditures in the United States 
arises from the fact that these limitations curb 
school facilities expansion and limit educational 
offerings. Therefore, it becomes necessary to 
determine which limitations are desirable and 
which are not, and to develop principles to guide 
the amending and forming of laws limiting these 
expenditures. 

The major objectives of this study are (1) to 
establish the items of public school expenditures 
which should be limited by law, (2) to develop 
some principles which should be used to guide 
the amending of old and the formation of new laws 
which limit local public school expenditures, and 
(3) to apply these findings to current Minnesota 
laws limiting public school expenditures. 


Method of Procedure 





For the purpose of developing principles, the 
literature on the subject of these limitations was 
reviewed. The following points of view were de- 
termined as representing the thinking of authors: 


1. General expenditure limitations were con- 
sidered to be a function of the local govern- 
ment rather than of the state. 

. Tax limitations were opposed. 

. Debt limitations were favored. 

. Interest rate limitations were opposed. 

. Budgetary laws were considered desirable. 

. Statutory limitations were favored over con- 
stitutional limitations. 


Next, the laws of the forty-eight states were 
assembled for the purpose of determining all the 
items limited by law. These were listed and in- 
cluded in a questionnaire along with principles 
whose origination is described below. 

Principles concerning limitations on expend- 
itures were developed based on two criteria for 
their formation: (1) actual practice in the major- 
ity of states, and (2) from theory as represented 





in literature, or both. These principles, and the 
items of limitation found in the state laws, were 
submitted for evaluation to two panels of experts. 
These two panels were composed of: 


1. All the state departments of education of the 
forty-eight states. 

2. All those professors of educational adminis- 
tration who met at the convention of the Na - 
tional Conference of Professors of Educa- 
tional Administration at the University of 
Connecticut between August 21 and 27, 1955. 


In this questionnaire, the statement was made 
that each item listed should not be limited by law. 
Both the eight principles and these items of limi- 
tation so stated were evaluated by the panels by 
checking the categories ‘‘agree’’, ‘‘agree with 
reservations’’, and ‘‘disagree’’. 

Eighty-seven percent (72 people) of the profes- 
sors of educational administration and ninety-one 
percent (46 states) state departments of education 
completed the questionnaire. These responses 
were treated statistically to determine (1) that 
one of the three categories hada greater response 
than any of the other two categories for the item 
or principle under consideration, (2) whether this 
particular arrangement of answers in categories 
was-different from chance or theoretical frequen- 
cy at the 5 percent level of significance, and (3) 
that the largest item alone contributed essential- 
ly to the deviation from chance arrangement at 
the 5 percent level. Chance as referred to in 
number 2 above is interpreted as meaning the 
number of responses which would have occurred 
in any one category if the total number of re- 
sponses had been divided evenly among the three 
categories. Table I shows for principle one, the 
fact that the total response was sixty-six and 
therefore the chance response for any one of the 
three categories was twenty-two. 

Since these data were categorical in nature, 
that is, enumerative and could not be ranked or 
set in any order, the chi square test of signifi- 
cance was used. The formula for testing wheth- 
er the arrangement of responses was signifi- 
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cantly different from chance is: 


Chi square = 
i (observed frequency - theoretical frequency)* 
theoretical frequency 





An illustration of how this formula was applied 
will be shown inthe analysis of the responses 
of the professors of educational administration to 
principle one (Table I). 

The chi square table shows this figure of 40. 727 
with two degrees of freedom to occur with a prob- 
ability of less than .001. Therefore, at the pre- 
viously selected 5 percent level, the null hypothe- 
sis that the arrangement of responses on principle 
one came about by chance, can be rejected. This 
same type of analysis was used throughout for the 
responses of the two panels of experts to both the 
principles and the items of limitation. 

The next step in the statistical analysis of these 
data was the testing of whether or not the largest 
response category contributed essentially to this 
deviation from chance arrangement, also at the 5 
percent level of significance. This could be solved 
directly from the 2 x 2 table where: © 


b = the theoretical 
or chance dis- 
tribution of the 
largest response 
category 


a = largest response 
category 


c = the total respon- 
ses in the other 
two categories 


d = the sum of the 
chance distribu- 
tions of the other 
two categories 


and the formula for direct solution of chi square 
is: 


(ad - be)* (a+ b+c +d) 
(a + b)(c + d)(a + c)(b + d) 





Chi square = 


Since there is a total number (N) of responses, 
the theoretical distribution of each category is 
(N/3). Also, since there is the same total num- 
ber (N), the two remaining observed frequencies 
may be represented by (N - the category to be 
tested) or (N - a). Therefore, the following rela- 
tionships exist: 


ZZe 
—— 
w 


-a 
N/3 


a0 TB 
nun u 


nw 


Substituting these for b, c, and din the above 
formula, the following is derived in terms of a 
and N only: 





' 2N(3a - N)? 
Chi square = T3a + N)(GN - 3a) 

This is of value in facilitating calculations in that 
the fundamental numbers can be calculated easily 
and directly for but three expressions: 2N, 3a, and 
5N. 

The use of this formula for chi square is illus- 
trated in Table Il, using the responses to princi- 
ple one by the professors of educational adminis- 
tration. 

Since ‘‘a’’ represents the responses in the 
‘‘agree’’ category, and the chi square 14.667 with 
one degree of freedom occurs by chance witha 
probability of less than 5 percent, it can be as- 
sumed that the ‘‘agree’’ category contributed most 
to the difference from chance shown in the first 
analysis. This analysis was used for the response 
to the principles and items of limitation as given 
by both the professors of educational administra- 
tion and the state departments of education. 

A further analysis was used to make a statisti- 
cal cross comparison between the responses of the 
two panels of experts used. It was necessary to 
determine whether or not the two distributions of 
responses to the three categories were in propor- 
tion or disproportion. The chi square test was 
made of the statistical hypothesis that these two 
panels may be regarded as having come from the 
same population with respect to their judgment of 
the principle or item under analysis. It is helpful 
in accepting one of the response categories to be 
able to show that the distribution of responses of 
both panels on the item are in agreement with each 
other. If it were found thatthe chi square was not 
significant at the 5 percent level, then the state- 
ment could be made that the two distributions were 
not disproportionate. 

The formula for this chi square with two de- 
grees of freedom is: 


; 1 
chi square = — (ZaP-n 
q oa ( p) 


Illustrative analysis of principle one using this 
formula is shown in Table III. 

Since the chi square of 3.184 is not significant 
at the 5 percent level, the hypothesis that these 
two panels are from the same population with re- 
spect to this item cannot be rejected. 


Method of Choosing Items from the Ratings 
of Experts 


The following possibilities actually occurred 
and the acceptance of them was outlined as follows: 





1. Accept the item as stated where 
a. Both groups responded significantly in 





TABLE I 


ANALYSIS OF PRINCIPLE ONE USING CHI SQUARE WITH TWO DEGREES 
OF FREEDOM 





Responses fo - ft (fo - ft)? 





Agree 46-22= 24 576 


Agree with reser- 
vations 6 6 - 22 = -16 256 


Disagree 14 14-22=- 8 64 
Total Responses 66 


L(fo - ft)? = 896 
Chance Responses 22 





; _ X(fo - ft)? _ 896 _ 
Chi square = ae 40.727 


TABLE Il 


CALCULATION OF CHI SQUARE USING DERIVED FORMULA 





Item N 2N 5N a 3a 3a-N (3a-N)* 3a+N 5N-3a Chisquare 





Principle One 66 132 330 44 132 66 4356 198 198 14. 667 








2N(3a - N)? oR = 14.667 


Chi square = (3a y N)(GN - 3a) ~ 
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the ‘‘agree’’ category, and were from the 
same population. 

. One group responded significantly in the 
‘‘agree’’ category and the other did not 
respond significantly in any category and 
were from the same population. 

. Accept the converse of the item where 

a. Both groups responded significantly in 
the ‘‘disagree’’ category and were from 
the same population. 

. Use writer’s own judgment based on what- 
ever other information is available 

a. Where neither group responded signifi- 
cantly in any category. 

b. Where both groups responded significant- 
ly in one category but were found tobe 
from divergent populations. 

. Recognize t he impossibility of analysis in 
this plan where 

a. Both groups responded significantly, but 
one responded in the ‘‘agree’’ category 
and the other in the ‘‘disagree”’ cate- 
gory, when either from the same or from 
divergent populations. 


Findings 


Applying these above criteria for acceptance, 
all of the eight principles could be accepted with 
the exception of principles two, three, and four, 


where no significance was found in the responses 
at the five percent level. These three principles 
were restated using the writer’s own judgment 
based on all available information. 

Using the same criteria, all the limitations on 
expenditures could be analyzed with the exception 
of the limitation on expenditures for maintenance 
of the student away from home, school paid pen- 
sions, and salaries and expenses of officers of 
school building corporations. 

The principles and items as finally stated un- 
der these conditions were: 


Principles 


1. Total debt should be limited, rather than 
warrants, bonds, etc., being limited sep- 
arately. 

. The only limit ontotal expenditures should 
be a budget voted upon by the people or a 
representative body of the people elected 
for that purpose, with dedicated funds be- 
ing allocated as dedicated. 

. Transfers within funds of the budget should 
only be limited for transfers out of dedicat- 
ed or capital outlay moneys. 

. Transfers between funds of the budget 
should be limited by statelawfor transfers 
out of dedicated or capital funds while 
other transfers should be limited by board 
action only. 

. Interest should not be limited as to rate or 
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amount, but should be arrived at through 
bids. 

. Purchases above a set limit should be ob- 
tained by bids. 

. Any limitations on expenditures or borrow- 
ing should be statutory rather than constitu- 
tional. 

. No exceptions should be made to the laws 
limiting expenditures or bonds to suit cer- 
tain districts, but if exceptions are neces- 
sary the entire law should be revised to fit 
all districts in the state. 


Limitations 


1. Salaries and expenses should be limited by 
law for: 
a. District school board members. 
b. County school board members. 
. Salaries and expenses should not be limited 
by law for: 
a. Assistant county superintendent. 
b. County superintendent or supervisor. 
c. County truant officer. 
. The following should be limited by law: 
a. Long term debt (bonds, etc. ). 
b. Short term debt (warrants, etc. ). 
c. Total debt. 
. The following should not be limited by law: 
a. Tax levies. 
b. Interest. 
. The following general expenditures should 
not be limited by law: 
Purchases in general. 
Election costs. 
Publishing. 
. Expenditures for real estate. 
. Membership dues to organizations. 
. Premiums on surety bonds. 
. Total expenditures. 
. Mileage. 
. Library board salaries. 
6. $Bronw sn. Aon items: 
a. Tuitions should be limited to actual cost 
of education per pupil. 
(The following items were not included because of 
the diametrically opposite points of view of the 
two panels of experts): 
b. Maintenance of students away from home. 
c. School paid pensions. 
(The following item was not included because it 
was rated inconclusively by the panels and there 
was little evidence to guide intelligent decision 
on its part): 
d. Salaries and expenses of officers of 
school building corporations. 


Minnesota laws were found to differ from these 
findings in that only four groups out of the sixteen 
limitations existing in Minnesota were found to 
conform. . They were: 
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. Salaries and expenses of board members. 
. Debt limitations. 

. Contracts and purchases in general. 

. Tuition. 


Of these, only the tuition laws completely con- 
formed to the applicable principles. In addition, 
Minnesota was found to lack any substantial budg- 
et law limiting expenditures. 





Recommendations 





In general, all states should examine their 
laws in the light of these findings and apply these 
principles, repealing those laws which should not 
exist and modifying those which are not in con- 
formity with the principles. 

Minnesota, in particular, should repeal all but 
the four classes of laws which are acceptable in 
terms of the findings, modify all of the remaining 
limitations except the tuitionlaws, and add a budg- 
et law in keeping with the principles outlined here. 
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WHICH OF two methods of teaching biological 
science to non-science Majors, preparing for 
teaching careers, will result in better perform- 
ance on criterion measures of abilities: (1) to re- 
call and apply biological facts and principles, and 
(2) to use some of the inductive aspects of scientif- 
ic thinking? This was the problem investigated in 
a recent quantitative study designed to shed fur- 
ther light on a few aspects of the perennial prob- 
lem of teaching methods. 1* 

The question of what objectives are worthy of 
the science teacher’s efforts were succinctly sum- 
marized in four pivotal ones including, according 
to Kenneth E. Anderson:2 


1. Acquisition of factual information in science. 

2. Understanding and application of the princi- 
ples of science. 

3. Understanding and application of the elements 
of the scientific method together withits asso- 
ciated attitudes, and 

. Skill in the basic tools peculiar to a specif- 
ic science. 


As far as the general education of college stu- 
dents is concerned, all of these, Louis Heil’ says, 
are based on the assumption that the understand- 
ing and use of scientific procedures in meeting 
needs in the basic aspects of living (also known as 
problem-solving) is the most significant contribu- 
tion of science instruction. Many critics of educa- 
tion have expressed much concern for the ‘‘prop- 
per’’ instruction of our young people majoring in 
in science. However, the future of science and 
its achievements depends, to an unprecedented de- 
gree, upon a scientifically literate citizenry in our 
democracy. Such conditions justified the conduct 
of this investigation and, therefore, indicated its 
central importance as one part in the total current 





* All footnotes will be found at end of article. 





educational picture. 

From a critical analysis of both quantitative 
and qualitative methodological studies in the liter- 
ature of general education (mainly science educa- 
tion) we may conclude, tentatively, that as far as 


1. The acquisition of factual information or 
knowledge, the conventional teacher - 
controlled or ‘‘traditional’’ methods were 
superior; 

. Outcomes extending beyond those of factual 
information, such as the ability to apply sci- 
entific principles, to interpret data, and to 
draw conclusions from data, the evidence is 
not clear-cut as regards different instruc- 
tional procedures (probably, in the major- 
ity of the studies, there seemed to be evi- 
dence in support of those methods that tried 
to provide learning situations designed for 
the direct attainment of these purposes); 

. The development of scientific attitudes, 
these were more likely to be attained through 
such methodologies as recognize and pro- 
vide for their direct development rather 
than as concomitant learnings. 


As aresult of his experiences with schools, teach- 
ers, and teaching, plus a review of theliterature 
in science education, the author (investigator) be- 
came concerned with the problem of how to im- 
prove his ability to present science in ways that 
not only meet the students’ needs but also make it 
a lively and interesting subject. 

Accordingly, the question of how to teach to- 
wards the attainment of such objectives was inves- 
tigated in this experiment by comparing two teach- 
ing methods representing conflicting educational 
philosophies. These are often referred to as tra- 
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ditional, authoritarian, or teacher-centered, and 
progressive, permissive, or student-centered. 
Various aspects of the teaching-learning situation 
were, therefore, operationally defined. The role 
of the instructor and of the students in the deter- 
mination ctf objectives, the selection of content, the 
planning of class activities, the evaluation of 
learning, and other related aspects of the teach- 
ing methods were differentiated to provide a con- 
trast in the two approaches. In general, the 
teacher-centered method was characterized as 
authoritarian; the teacher not only directed his 
own activities'in preparing and presenting the sub- 
ject matter, but also the students’ activities in as- 
signing the outside readings and other written 
work which he required of each individual. By 
contrast, the student-centered method was char- 
acterized as permissive; the teacher shared re- 
sponsibilities with the class, and through cooper- 
ative teacher-student planning made provision for 
all aspects of the teaching-learning situation. 

In the teacher-centered method, the classroom 
procedure consisted of lectures and dem onstra- 
tions given entirely by the instructor. In the stu- 
dent-centered method, the students, following co- 
operative teacher-student planning sessions, de- 
veloped the content of the course by units. They 
then selected topics or problems of personal in- 
terest to them and worked on these in small groups. 
Considerable use was made of audio-visual aids 
such as biological preparations, charts, models, 
films, etc.; these were supplied automatically to 
the teacher-centered groups, but were made avail - 
able for use by small groups within the student- 
centered method groups, if they sodesired. Only 
one textbook was used in the teacher -centered 
method groups while six basic textbooks (one- 
sixth of a section having any one textbook) plus a 
wide variety of reference materials were used by 
the student-centered method groups. 

The following example may show how students 
in the contrasting methods groups were aided in 
developing a better understanding of the scientif- 
ic method of solving problems. 

In the experimental groups (student-centered) 
the students selected the name of a biological sci- 
entist whose work they wished to study. About 
six groups of from 4 to 7 students in each thus 
studied such men as William Harvey, Frederick 
Banting, Walter Reed, and others. They were 
guided in this group study by a suggested list of 
references and a sheet of questions which they 
used as a basis for drawing up a group report to 
present to the whole class.* This sheet raised 
such questions as the following: 


What was the scientist’s problem(s)? 
How did he become aware of it (them)? 
What use of authority did he make? 
What were his sampling procedures? 
What conclusions did he draw? 


Were they justifiable? 


Other questions related to the scientific meth- 
od of solving problems were also listed. Each 
small group evaluated theirown report as well as 
being graded by the rest of the class. 

In the control groups (teacher-c entered) each 
student was required to draw up a written report 
in which an example of an unscientific approach 
was contrasted with an example of ascientific meth- 
od of approachtosolving problems. Many refer- 
ences to examples of both types were given tothe 
students and also a guiae sheet on the Basic Assump- 
tions of the Scientist, which in essence, was an out- 








line of scientific method andattitude.5 The students 
were to pick from their examples instances in which 
the scientific method(s) were or were not used and 
incorporate these into their written report. These 
were graded on an individual basis by the instructor. 
The validity of any conclusions drawn from an 
experiment and the generalizability of inferences 
resulting from analysis of data demand that the 
questions to which we seek answers be framed in 
mathematical terms as the testing of statistical 
hypotheses. A modern self-contained experiment 
should utilize, as fully as possible, the principles 
of randomization, replication, and the use of local 
controls in its design. 6 For these reasons it is 
important, in conducting such a study, that the no- 
vitiate avail himself of the counsel and guidance 
of an experienced research director, particularly 
in the early stages of planning the experiment. 
Further, the design and analysis of an experiment 
should be planned as complementary parts of the 
total investigation and appropriate for the data at 
hand. Otherwise we unnecessarily subject our- 
selves to repeating the errors of others and of 
conducting a so-called ‘‘post-mortem”’ analysis. 
In order to gain familiarity and experience 
with the experimental or student-centered method 
of teaching and to focus on the details which differ- 
entiated it from the control or teacher-centered 
method, a pilot study was conducted with a 1955 
summer session class. This, along with the ex- 
perience gained in teaching and evaluating previ- 
ous years’ classes, enabled the investigator to 
conduct the crucial experiment with the students 
who registered for the Winter and Spring quarters 
of the 1955-56 academic year at Northern State 
Teachers College. The population consisted of all 
non-science majors attending the college over an 
undefined period of time. The samples of the pop- 
ulation investigated were formed from the total 
number registering for the Winter and Spring quar- 
ters. These 105 students were randomly assigned 
to one of four sections at the beginning of the ex- 
periment by sequentially numbering the students 
1, 2, 3, or 4 as they appeared to register and ar- 
bitrarily placing them in corresponding sections. 
Certain necessary exceptions were kept to a mini- 
mum. Replication was provided for by having two 
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sections taught by the control method and the oth- 
er two sections taught by the experimental method. 
The form of control was that of unrestricted ran- 
domization in which the same instructor at tempt- 
ed to teach both methods with equal zeal to dupli- 
cate groups of similar subjects. 

The author’s Survey of Science course isa 
year long sequence beginning in the Fall quarter 
with an introduction tothe physical sciences which 
includes units in astronomy, physics and chemis- 
try. During the Winter andSpring quarters selec- 
ted areas of the biological sciences are taken up. 
Scope and sequence are indicated in the following 
outline of units. 


Unit I: How do Scientists Solve Problems? (A un- 
it on the scientific method of problem solving) 

Unit I: What are Some of the Basic Characteris- 
tics Common to all Living Things? (A unit on 
the cell concept, cell structure, protoplasm, 
theories as to the origin of life, characteris- 
tics of life, and photosynthesis. ) 

Unit III: What are the Form and Function of Our 
Bodies? (A unit dealing with the skeleton and 
muscles, nutrition and diet, digestion, circu- 
lation, respiration and excretion. ) 

Unit IV: What is the Biological Basis of Behav- 
ior? (A unit on the nervous andendocrine sys- 
tems with emphasis on learning and mental 
health. ) 

Unit V: How is the Continuation of Life Forms En- 
sured? (A unit on plant, animal, and human 
reproduction and sex with emphasis on prob- 
lems of adolescence, young adulthood, and 
family living. ) 

Unit VI: How are the Characteristics of Living 
Things Transmitted? (A unit on the biological 
basis of heredity including the concept of evolu- 
tion of life forms. ) 

Unit VII: What Can We Do to Improve Personal 
and Public Health? (A unit in which modern 
individual and social health measures are stud- 
ied. ) 

Unit VII: How Can Man Help Preserve a Balance 
in Nature? (Auniton ecology and conservation 
with emphasis on local problems. ) 


Because of lack of time the final unit was omit- 
ted from the course. 

Three evaluation instruments were used as cri- 
terion measures to assess the difference between 
the initial and final status of the students in achiev- 
ing the objectives of instruction under the con- 
trasting treatments. These included tests for 
measuring (1) the ability to recall and apply bio- 
logical facts and principles, (2) the ability to use 
some of the inductive aspects in scientific thinking, 
and (3) knowledge of vocabulary andcomprehen- 
sion of reading passages and their interpretation 
in biology. The last mentioned test, given as an 
outside criterion, was the Cooperative Biology Test. 





The subject matter tests for evaluating stu- 
dents’ achievement in ability to recall and apply 
biological facts and principles for both Winter and 
Spring quarters were devised by the author. These 
were built as equvalent parallel forms from the re- 
sponses of similar students to previous adminis- 
trations of the test items. Both sets were built 
from items analyzed after the method of Davis.’ 
Reliability coefficients were determined for each 
administration by either the Maximum Likelihood 
estimate method for alternate forms or by the 
split-half method with application of the Spearman- 
Brown Prophecy Formula for single forms. 8 The 
former method, as applied to the Winter quarter 
(174B) pre-test results, gave reliability coeffici- 
ents ranging from 0.54 to 0.75. Using the latter 
method with alternate forms given as post-tests 
resulted in reliability coefficients ranging from 
0.75 to 0.88. Whenthe same form, originally 
given as a pre-test, was administered as a re- 
test two months after completion of Winter quar- 
ter instruction, the reliability coefficients were 
found to range between 0.66 and 0.88.. The Spring 
quarter (174C) pre-test results gave reliability co- 
efficients between 0.59 and 0.74, while those of 
alternate forms, given as a final, ranged between 
0.68 and 0.86. Therefore these tests were found 
to possess a reasonably high degree of internal 
consistency. 

Various aspects of problem-solving abilities 
were measured by using Mary A. H. Burmester’s 
Ability to Think Scientifically Test IA.9 As pre- 
viously noted the Cooperative Biology Test, Forms 
X and Y, was used as an outside criterion test. 

Other tests used as sources of information on 
students’ backgrounds included the Otis Quick 
Scoring Self Administering Test of Mental Ability, 
the Cooperative English Test Part C2, Reading 
Comprehension, and the Cooperative General 
Achievement Test, Test III, Mathematics. 

In the analysis of the data descriptive statistics 
were used, initially, to present test results ob- 
tained under the contrasting treatments (see Table 
I). The significance of the differences between 
the means of pre-tests and post-tests, which were 
correlated measures, was determined by using ap- 
propriate t-tests. After testing the assumptions 
underlying the pooling of data, the results of the 
replicated teacher-centered method group samples 
were combined, as also were the results of the 
replicated student-centered method group samples, 
in order to provide additional degrees of freedom 
for estimating the error component. Statistical in- 
ferences were then drawn on the basis of various 
tests of significance for several specific null hy- 
potheses accompanying three different statistical 
analyses. These included: (1) a two-way analysis 
of variance (two treatments by three levels of in- 
telligence), (2) F and t-tests for determining the 
significance of differences between the variances 
and means of boys and girls in and between the con- 
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TABLE I 


RANGE, MEAN, STANDARD DEVIATION AND MEAN GAIN SCORE ON INITIAL AND FINAL AD- 
MINISTRATION OF CRITERION TESTS ACHIEVED IN SAMPLES OF EACH METHOD AND 
IN EACH METHOD’S SAMPLES COMBINED 





Standard Mean Gain 
Cri- 


Range Deviation Score 
ter- Sample 


ion Treat- Pre- Final Pre- Final Final - 
Test ment# test test test Pre-test 








174B 2X 12-49 8-62 , , . 85% 

Wtr. 6X 16-49 22-59 . . . 29 

Qtr Exp. 12-49 8-62 : ‘ P . 85° 
Con. 11-54 19-69 : . ‘ .31 
3C 3-47 9-69 . ; , . 02? 
5C 15-54 19-65 . ‘ , . 14? 


2X 8-62 14-58 \ . ; . 99 
6X 22-59 12-62 : . . . 384 
Exp. 8-62 12-62 . ; : . 66 
Con. 19-69 6-64 ‘ . . . 76 
3C 22-69 22-64 : , . . 22! 
5C 19-65 6-60 ; : ° . 90 


2x 13-48 18-58 , ' ' . 68 
6X 19-56 23-68 : ‘ .59 
Exp. 13-56 18-68 ; i .43 
Con. 15-53 21-73 .08 
3C 16-53 21-68 ' 45 
5C 15-49 30-73 ’ .98 


2x 35-73 33-84 . 46 
6X 30-80 40-91 ; . 28 
Exp. 30-80 33-91 ’ ll 
Con. 32-77 38-89 .97 
3C 32-85 38-89 ; . 67 
5C 33-77 40-86 . 56 


2x 12-67 5-40 i . . » 5S -12. 
6X 18-60 7-49 ’ ; ; ~ 3ST -10. 
Exp. 12-67 5-49 - , ; . 86 -11. 
Con 9-66 8-56 ' ; . .18 a - 
3C 15-61 13-45 a ' ; . 26 - 8. 





4Number indicates period class meets and letter X or C following designates these as experi- 
mental (Exp.) or control (Con.). 


umber in combined group may differ from sum of method samples due to dropouts or incom- 
plete data. 


‘Hypothesis of equal variances between pre-test and post-test is in the region of doubt (.05>P 
>.01). 

? Hypothesis of equal variances between pre-test and post-test is rejected (.02>P>.01); in all 
other cases it is accepted (P >. 05). 


‘Hypothesis of equal means between pre-test and post-test is accepted (P>.05); in all other 
cases it is rejected. 
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trasting treatments, and (3) a one-way analysis 

of variance and covariance of final scores holding 

pre-test and Otis intelligence test scores con- 
stant. 

The null hypotheses tested for each criterion 
measure stated that there were no significant dif- 
ferences between college students who took gener- 
al education biological science instruction at North- 
ern State Teachers College under either teaching 
method as regards: 


1. Mean initial and mean final performance. 

2. Mean final performance of upper, middle, 
and lower third levels of intelligence, assum- 
ing no interaction between treatments and 
intelligence levels. 

. Mean initial and/or meanfinal performance 
of boys and girls in and between treatments. 

. Mean final performance when inequalities 
of pre-test and Otis test scores are parti- 
alled out. 


The results of these analyses showed that: 


For hypothesis one, generally, (1) mean sub- 
ject-matter performance was increased signifi- 
cantly in both treatments, (2) differential reten- 
tion of subject-matter did not persist upon re-test 
two months after Winter quarter instruction, and 
(3) mean scientific thinking ability performance 
was increased significantly in both treatments. 

For hypothesis two, (1) no significant interac- 
tion between treatments and intelligence levels, 
(2) significant differences in mean performance 
among intelligence levels as was expected, and 
(3) no significant treatment differences in mean 
performance among intelligence levels. For sci- 
entific thinking ability, the last conclusion was 
confounded since the subclass variances were not 
homogeneous. The fulfillment of this assumption 
is, however, essential to the valid interpretation 
of such results. 

For hypothesis three, (1) the mean initial and 
final performance of girls was consistently, and, 
in most cases, significantly higher than that of 
the boys in both treatments, (2) on one criterion, 
the Spring quarter (174C) test of ability to recall 
and apply biological facts and principles, the 
mean performance of teacher-centered method 
group boys was significantly higher than that of 
the student-centered method group boys, and (3) 
there was no significantly different mean perform- 
ance between boys and girls under either treat- 
ment. 

For hypothesis four, (1) the mean subject- 
matter performance of the teacher-centered meth- 
od group students was higher than that of the stu- 
dent-centered method group students, and (2) the 
mean performance of students in some of the in- 
ductive aspects of scientific thinking ability was 
not significantly different under either treat- 





ment.10 (See Table Il). 
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A TABLE OF NORMAL DISTRIBUTION FRE- 
QUENCIES FOR SELECTED NUMBERS OF 
CLASS INTERVALS AND SAMPLE SIZES 


WILLIAM J. MOONAN* 
U. S. Naval Personnel Research Field Activity 
San Diego, California 


1. Introduction 





THE PROBLEM of ‘‘fitting’’ a normal distri- 
bution to an observed frequency distribution is 
commonly treated in statistical textbooks. A re- 
lated and easier problem is concerned with find- 
ing frequencies for a normal distribution with a 
specified mean, standard deviation and number 
of class intervals. The latter problem is fre- 
quently encountered by research workers in so- 
cial science fields when basic information is col- 
lected from subjects who are asked to sort a set 
of items, phrases or sentences into categories. 
This procedure is frequently referred toasa 
‘‘forced-distribution’’ technique. 

As an example, a researcher devises 40 items 
dealing with attitudes toward a Naval career. He 
may wish a sample of subjects who are either ca- 
reer or non-career prone to partition these 40 
items into five groups such that the frequencies 
in each group corresponds as nearly as possible 
to those derived from anormal probability distri- 
bution function. In this way each item can be giv- 
en a number which is associated with the category 
to which it is assigned by the subjects in each of 
the career groups. The ultimate object of the an- 
alysis is to contrast the mean item score between 
groups in order to determine items which dis- 
criminate effectively between the career groups. 

In order to find the appropriate frequencies 
for each of the five categories it is necessary to 
specify the range of the scores to be applied to 
the items and the widths of the class intervals. 
It is common practice to make the scores vary 
from one to a number corresponding to the num- 
ber of categories, inthis case, five, and to make 
all class intervals of equal size and equal to one. 
We shall adhere to these conventions. With these 








specifications the range ofthese scores will be 
five (i.e., 5.5-0.5), and because of symmetry, 
the mean score will be necessarily three. 

The choice of a standard deviation is not alto- 
gether arbitrary. For instance, ag=10 would 
obviously be inappropriate for the problem pre- 
sented above. Certainly ag = 1 would not seem 
unreasonable, nor for that matter, would o = 3/4. 
What o should be chosen? There is an objective 
way of determining the answer to this question. 
The purpose of this note is to show interested 
readers how to find the appropriate o and to pro- 
vide a table of frequencies for various numbers 
of categories and sample sizes. In this discus- 
sion, sample size refers not to the number of 
subjects used in the study, but rather to the num- 
ber of items to be assigned in all categories of 
the distribution function, in other words, the total 
frequency. 


2. Determining the Frequencies 





It is generally known that the average range of 
scores of samples taken from a normal distribu- 
tion depends upon the samplesize. Various rules 
of thumb have been advanced to indicate how many 
o’s are included within the range for various sam- 
ple sizes. However, Tippett (4) in 1925 derived 
the appropriate statistical answer and his table of 
the ratio, Range/o for various sample sizes ap- 
pears in several places (3,4, 5). Unfortunately, 
the existence of this table and its use for the prob- 
lem at hand are not widely known. 

The table is entitled ‘‘Mean Range in Normal 
Samples of Size n.’’ As the title indicates, the 
table provides average values of ranges in sam - 
ples of size n from a normal distribution func - 
tion with zero mean and unit variance. By using 


*The opinions expressed are solely those of the author and are in no way official; nor are they to be con- 
strued as representing those of the U.S. Naval Personnel Research Field Activity or Bureau of Personnel. 
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TABLE Il 


A TABLE OF NORMAL DISTRIBUTION FREQUENCIES FOR SELECTED NUMBERS OF CLASS 
INTERVALS AND SAMPLE SIZES WITH SOME CORRESPONDING STATISTICS 
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TABLE I 


COMPUTATIONS FOR DETERMINING NORMAL FREQUENCIES FOR A DISTRIBUTION 
WITH A RANGE EQUAL TO 5 AND A SAMPLE SIZE OF 40 





Class 


Interval Midpoint Z(2) 


o{ Z(2)] 





-o to1.5 
1.5 to 2.5 
2.5 to 3.5 
3.5 to 4.5 
4.5 to @ 





the following equation, we can determine the ap- 
propriate standard deviation for our distribution 
by solving for o. 


(1) Range - w, 


where ‘‘Range’’ is the range we specify for the 
distribution and W is the table entry which, of 
course, depends upon n, the number of items. 
The table given in (3) gives W’s for n= 2(1)499 
and for n = 500(10)1000. Tippett’s table is use- 
ful for the problem posed since we can specify 
that the average range desired should be that 
specified by the difference between the upper end 
of the largest class interval and the lower end of 
the lowest class interval. In such an event, the 
table can be immediately used to determine o. 

For the example considered above n = 40 and 
W = 4.32156 or 4.322, say. Therefore o = 1.157 
since the range was 5. The probability of an ob- 
servation from a normal distribution, X, being 
less than or equal to the cth class limit is 


(2) P{X<Z(c) + AZ/2} =O({ Z(c) + 4Z/2-p] /o) 


and the frequency of observation belonging to the 
cth class interval is 


(3) np(c) = nO ({ Z(c)+AZ/2-p] /o- nh ({ Zc) 
- 4 Z/2-4] /o) 
np(c) =n [ Z(1)] -nO[ Z(2)]. 
where 
n is the size of the sample, 
Z(c) is the midpoint of the cth class interval, 
AZ is the width of the class interval, 


p is the mean of the population, 
o is the standard deviation of the population, 





@ is the standardized cumulative normal dis- 
tribution function. 


The numerical method of computing the final 
frequencies, f, will now be described. First of 
all, the class intervals and their corresponding 
mid-points are recorded in the first two columns 
of Table I. Then the values of the mid-points, 
the half-width of the class interval and the popu- 
lation means are substituted in the expressions 
for Z(2) as defined in equation (3). This is done 
for all class intervals except the first (or small- 
est). Since the tails of the normal distribution 
extend to -oo and +o the first and last class in- 
terval are not finite. The computing procedure 
assumes that the first midpoint is not 1, but -o, 
and therefore Z(2) for this particular interval is 
-o. However, the computing procedure does 
utilize the 5 for the highest mid-point and com- 
putes a finite Z(2). In effect this amounts to com- 
puting a standardized variable for the point at the 
upper limit of the next to the largest class in- 
terval. 

Having computed the Z(2)’s, the next task is to 
refer these values to a tableof the cumulative 
normal distribution. Table II in Hald’s Statistical 
Tables and Formulas (1) is admirably suited to 
this problem since entries for both positive and 
negative Z(2)’s (u’s in Hald’s notation) are provid - 
ed. A more extended table is provided by Kelly 
(2). Kelly’s table was used to compute the Z(2)’s 
for Table Il. 

Of course the cumulative normal distribution 
function for Z(2) = -oo yields a [ Z(2)] of zero. 

Each $[ Z(2)] is next subtracted from the 
@[ (Z(2)] associated with the next largest class- 
interval. These differences are called p(c)’s and 
are recorded in the fourth column. The last p(c) 
is determined by subtracting the last [ Z(2)] 
from unity. The p(c)’s arethen multiplied by n to 
give the theoretical frequencies, n p(c). Oper- 
ational work demands that the final frequencies be 
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integers and therefore rounding the n p(c)’s is nec- 
essary. This rounding may result in a total fre- 
quency 1 or 2 units different from n. This means 
that the computer will have to exercise his discre- 
tion in adjusting the frequencies so that the total 
sample size isn. This problem occurs with the 
data given in Table I. After rounding, the total 
frequency was 39 and this meant that a unit adjust- 
ment had to be made. In order to maintain sym - 
metry, the frequencies on the flanks of the distri- 
bution must remain inviolate or be changed sym - 
metrically. It was decided to increase the 
central frequency by 1. This yielded the f r equen- 
cies listed in the f column. Rounding and adjust- 
ing the frequencies means that the original o used 
to derive the frequencies will not be maintained. 
However, it will ordinarily be only slightly affect- 
ed. In this present case, the o for the final distri- 
bution is 1.118 whereby originally it was assumed 
to be 1.157. The computation of the final frequen- 
cies can be shortened by taking advantage of their 
symmetry. For a small number of class inter - 
vals the saving is nottoogreat, however. Besides, 
calculating all frequencies serves as a check on 
the calculations since symmetry must be obtained. 
Table II presents a set of frequencies fora 
wide variety of class intervals and sample sizes. 
This information should be suitable for most oper- 





ational circumstances. Nevertheless, the general 
method for determining the frequencies is given 
so that special cases can be evaluated with ease. 
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THE USE OF PUPIL ACCOMPLICES TO IN- 
VESTIGATE TEACHER BEHAVIOR 


EVAN R. KEISLAR and JOHN D. McNEIL 
University of California at Los Angeles 


ATEACHER’S behavior often is afunction of the 
way his pupils respond. If so, he will adopt cer- 
tain ways of teaching and reject others depending 
upon his pupils’ reactions. In order to study this 
important effect of pupil upon teacher, it is de- 
sirable to manipulate the behavior of pupils ina 
predetermined fashion. This study explored the 
possibility that such experimental conditions 
could be established by using pupil accomplices. 
Apart from this methodological inquiry, the ma- 
jor hypothesis of this study was that teachers re- 
liably differ to the extent to which they find enjoy- 
ment a reinforcement as compared with pupil 
gain in achievement. Aminor hypothesis was 
that such a variable is significantly related to 
other verbal behavior of the teacher; specifically, 
his score on a dogmatism test, his expressed 
verbal attitudes toward teaching, and his choice 
of phrases used in describing his pupils. 


General Procedure 





Each teacher was instructed to first usetwo 
methods in teaching pupils individually and sub- 
sequently to select and use in continuing instruc- 
tion one of the two methods deemed most appro- 
priate. Pupils were previously instructed to 
show greater enjoyment for one method but more 
gain in achievement by the other. A balanced de- 
sign was used so that each teacher taught two 
boys and two girls, two of whom showed enjoy- 
ment of the first method and two of whom showed 
enjoyment of the second. In this way the variable 
under study is independent of the method itself 
and the sex of the pupil. 


Subjects 


The subjects of the study were forty teachers 
in training enrolled in a teaching methods class 


at the University of California, Los Angeles. 
Each day for five days a different group of eight 
teachers appeared for an eighty-minute session at 
the University Elementary School ostensibly to 
‘‘assist’’ in an experiment with teaching methods. 


Instructions to Subjects 











Upon their arrival at the school, these teach- 
ers were given a twenty-minute briefing. Instruc- 
tions were modified slightly on the third day and 
again on the fourth so that the study essentially in- 
volved three different experimental conditions as 
discussed later. In general, the subjects were 
told that they were to assist in a study of the “‘ ap- 
propriateness’”’ of two teaching methods, a visual 
and a kinesthetic method, in relation to individual 
pupil differences. Teachers were advised that 
during the next hour four pupils would appear for 
tutoring, one atatime. Each pupil would bring 
with him two packets of cards on which were writ- 
ten particular spelling words selected especially 
for him. 

The teachers were instructed to teach the pupil 
the ten spelling words in the first packet by the 
visual method and the ten words in the second by 
the kinesthetic. During the session the pupil would 
be required to spell the words from the two pack- 
ets. Those words misspelled by the pupil were to 
be set aside and taught in a third attempt using the 
method which appeared most ‘‘appropriate. ’’ After 
each pupil was dismissed, the teacher was to re- 
cord the method chosen as appropriate for the child 
as well as his ‘‘general impressions’’ of the pupil. 
All instructions in detailed form were both read 
and taken for reference by the teachers who were 
sent to individual rooms where they awaited the 
arrival of the pupils. 


* Appreciation is expressed for the cooperation of the staff and pupils of the University Elementary 
School, University of California, Los Angeles, and to David Kagan who assisted with the study. 
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Pupil Accomplices and Their Instructions 


Eight sixth grade pupils were the accomplices 
for the entire experiment. Pupils were instruct- 
ed to cooperate with the teacher in all respects 
but to respond differently to the kinesthetic and 
visual methods. Half of the pupils were told to 
show obvious pleasure and to misspell certain 
words during the presentation of the lesson by the 
visual method. They were to show no enjoyment 
of the kinesthetic method but to misspell fewer 
words taught by this method. The other half of 
the pupils were told to do the opposite, i.e., to 
show enjoyment of the kinesthetic method but to 
misspell fewer words by the visual. Ina prelim- 
inary training session of approximately half an 
hour duration, each pupil rehearsed his role and 
practiced the exact words he was to misspell. 
Each pupil was assigned daily tofour teachers. 
After each teaching situation, pupils reported to 
the experimenters the method selected by the 
teacher. 


Experimental Conditions 


As implied earlier, three experimental condi- 
tions were used during the study. In ConditionI, 
in effect for the first two days, pupil accomplices 
misspelled three more words by one method than 
by the other, and the experimenters gave no 
statement to the teachers regarding the basis for 
selecting the ‘‘appropriate’’ method. As indicat- 
ed in Table I, the vast majority of the teachers 
selected the method which resulted in greater a- 
chievement gain. Therefore, onthe third day 
Condition II was established, wherein pupil ac- 
complices misspelled only two more words by one 
method and the wording of the instructions was 
changed slightly to make the criterion of spelling 
improvement less obvious. 

Inasmuch as the results under Condition still 
indicated a skewed distribution, for the last two 
days an additional change was made in the in- 
structions. Here, under ConditionlIll, the teach- 
ers were instructed to judge the ‘‘appropriate- 
ness’’ of the method in terms of both ‘‘long and 
short term consequences’’ and what ‘“‘would be 
better for the pupil in view of his emotional re- 
sponses, achievement, mental stumbling blocks, 
physical behavior and the like. ’’ 


Results 


As indicated in Table 1, the variability among 
the teachers under Condition I and II was not large 
enough to justify a statistical test of the reliabil- 
ity of this measure. However, under Condition 
Ill, half of the sixty-four choices were made in 
the direction of student preference, warranting a 
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test of significance of teacher consistency. The 
scores obtained from the first and third choices of 
the sixteen teachers in Condition III were compared 
with their scores from the second and fourth. 
Twelve of the sixteen cases were found to be above 
or below the median on both pairs. Using the one- 
tailed sign test, the null hypothesis is rejected at 
the .05 level. This permits the conclusicn that 
under Condition III of this study, teachers reliably 
differ in the extent to which they find pupil enjoy- 
ment as compared with pupil gain in achievement 
the more important reinforcement. 

In order to discover the correlates of this meas- 
ure, those teachers in each of the three _ experi- 
mental groups who were above their respective 
group medians were compared on several bases 
with those below these group medians. No differ- 
ences were found between the high and low groups 
on the Rokeach Scale of Dogmatism.** Further, 
no differences were found with respect to the fre- 
quency with which the groups used task -cent red 
or emotional phrases in their descriptions of pu- 
pils, these phrases having been so classified by 
four judges. The two groups of teachers were al- 
so compared with respect to their answers on fif- 
teen two-choice items dealing with attitudes 
toward teaching. Only one difference significant 
at the required .001 level was found: When asked 
to choose the better solution to parent criticism 
of the school, teachers who were more influenced 
by student preferences checked c 00 perative plan- 
ning and improvement of interpersonal relations, 
while those teachers who were influenced by pupil 
gain in spelling checked independent study and 
decision-making by school authorities. 

Questionnaires administered to the teachers at 
the conclusion of the study revealed that only one 
teacher of the forty entertained the possibility of 
coached behavior on the part of the pupils. Prac- 
tically all of the teachers regarded their participa- 
tion a realistic and valuable experience. Most of 
the successful control of this teaching environment 
was attributable to the role-playing ability of the 
pupil accomplices. These pupils played their 
parts with zest every day, their enthusiasm on 
the fifth day being as high as onthe first even 
though they were each completing twenty such 
teaching sessions. Agreement of reports from 
pupils and teachers concerning the method select- 
ed was almost perfect, pupils andteachers agree- 
ing on all but one of the one hundred and sixty in- 
dependently made reports. 


Conclusions 


In the teaching of spelling where nocriteria 
for selection of method aresuggested, most 
teachers are influenced more by pupil spelling 
gain than by pupil preference. This is probably 


ye Rokeach, Milton, ‘‘Political and Religions Dogmatism: An Alternative to the Authoritarian Personal - 





ity,’’ Psychological Monographs: General and Applied, Vol. 70, No. 18, 1956, pp. 1-43. 
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because spelling to most teachers has always 


been associated with the criterion of achievement. 


On the other hand, where multiple criteria for 
selection of method are mentioned, teachers dif- 
fer reliably in the extent to which they are influ- 
enced by pupil enjoyment as contrasted with pupil 
gain in achievement. While itistempting to 
speak of these two groups of teachers as being 
either ‘‘child-centered’’ or ‘‘ subject-centered’’, 
no such interpretation has been drawn since these 





terms themselves are in need of clear definition. 
There were many teachers who yielded to pupil 
preference in the experiment, yet in their com- 
ments gave the clear impression of placing more 
importance upon subject mastery. 

It would appear that pupil accomplices can be 
effectively used in establishing experimental con- 
ditions for the study of teacher behavior. The tech- 
nique has merit as a research tool, making possible 
realistic but controlled classroom conditions. 
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A CONTROL CHART FOR ERRORS IN IBM 
TEST SCORING MACHINES 


HARVEY F. DINGMAN, WILLIAM G. HOYT,** KENNETH F. THOMSON 
Personnel Research Branch, TAGO 
Department of the Army 


IN SCORING test papers with the IBM Test 
Scoring Machine, it is well known that variations 
in humidit , voltage, and age and condition of an- 
swer sheets can effect the scoring response of the 
machine. Because of the variability in the scores 
obtained, some form of checking procedure is fol- 
lowed by mostusers of the scoring machine. The 
checking usually consists of a complete rescor- 
ing of papers or spot checks. The extensiveness 
of spot checking depends in part upon local tradi- 
tion and in part upon the kind of precision re- 
quired for the job at hand. Checking of scores 
costs money or time as does any other quality con- 
trol measure. Further, neither local tradition 
nor any arbitrary ‘‘kind’’ of precision is enough 
to provide information sufficient to achieve eco- 
nomical scoring checks. 

Since checking answer sheet scoring is exactly 
analogous to inspecting industrial production 
it was decided to investigate one of the statistical 
quality control devices used in industry. 

Kimball (1) has proposed a number of sampling 
plans for analysis of errors in scoring IBM an- 
swer sheets. However, none of the plans in Kim- 
all’s paper are presented on the basis of practi- 
cal experience. It is proposed in this paper to 
present checking procedures based upon regions 
that seem useful empirically. 

In a study of the comparability of hand scoring 
of IBM answer sheets, the agreement between 
five machine scorers were tabulated. (See top of 
next page.) Thus, while there is far from per- 
fect agreement among the scorers, there is very 
good agreement if the criterion of + 1 point is 
used to define agreement. As amatter of fact, 
the mean number of agreements + 1 point is 96.0 
and the Standard Deviation is 1.95 agreements. 

Since this scoring was done from a random 
group of papers and the scoring was done care- 
fully, 96 percent agreement (agreement between 








two scores within + 1 point) seems to be an attain- 
able degree of agreement between scorers. This 

seems an acceptable standard of agreement to use 

to require as a criterion of acceptability for a set 

of scores derived from rescoring of IBM answer 

sheets. In different situations, different stand- 
ards wiil be necessary. In very short tests where 
important decisions are to be based on scores 

within two or three points of each other, obvious- 
ly more rigorous standards must be used. In the 

Situation where there are large samples, long 

tests, and procedures are somewhat flexible, 96 

percent agreement seems adequate. For practi 

cal considerations, one would not care to use a 

scoring that agreed less than 90 percent with an- 
other scoring. Actually, 2-1/2 S.D. down from 

96 percent agreement is 91 percent agreement. 

This seems to be an acceptable bound for reject- 
ing scorings. If a group of scorings of papers 

from a large sample showed agreement between 

91 percent and 96 percent between two scorings of 

the same paper, more papers would be examined 

before accepting or rejecting the scorings of the 

larger sample. 

If we consider each test paper as a random 
sample from a population of papers, and we re- 
score the paper to see if the original score on the 
paper falls in the limits of error (+ 1 score point), 
then we make a yes-no decision whether to accept 
the score on the paper as correct or not. For a 
group of such papers we could use the binomial 
distribution to test the hypothesis that the popula- 
tion from which the group came possessed the 
quality, ‘‘96 percent of all the papers if rescored 
would fall in the limit of error (+ 1 score point).’’ 

The sequential probability ratio test for the 
mean of a binomial distribution as presented in 
Wald (2) seems to provide a sol ution to the prob- 
lem. If we let Pg be the lower boundof the propor- 
tion of defectives in an acceptable group, and P, 


* The opinions or conclusions contained in this report are those of the authors, and are not to be con- 
strued as reflecting the views or indorsement of the Department of the Army. 


**H. F. Dingman now at Pacific State Hospital, Pomona, California; W. G. Hoyt now at Systems Devel - 
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FIGURE I 


CONTROL CHART FOR ERRORS IN SCORING 
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DINGMAN - HOYT - THOMSON 


AMOUNT OF AGREEMENT AMONG FIVE MACHINE SCORERS 





Number of Perfect Agreements 
Out of 100 Papers* 


Scorer 








Scorer 


Number of Times Scorers Agreed 
Within + 1 Point on 100 Papers* 


Scorer 











55 
52 70 
83 55 57 


55 75 75 59 


95 
94 95 
99 98 93 


97 98 94 97 





*Test is a 60 item test for which the scoring formula is R - 1/3W. The 100 papers were selected 
as a random group of papers that routinely come in from the field. 


be the upper bond of the preceeding paragraphs 
Po = .04 and P, = .09. The true proportion de- 
fective Po will vary considerably. For econom- 
ic reasons such as keeping the rescoring toa min- 
imum, the limits of confidence for errors of the 
first (a) and second (g) kind should be placed at 
the five percent level. There seems tobe no rea- 
son for placing one confidence limit (a or 8) high- 
er than the other as one should be equally willing 
to reject papers (as poorly scored) as willing to 
accept the scorings. This is not always true how- 
ever; in certain circumstances suchas preparing 
the norms for an important test, it would be de- 
sirable to err in the direction of failing to accept 
the scoring of a sample of papers. 

Wald (2), in a discussion of the formulae for 
the testing for the mean of a Binomial dist ribu- 
tion, presents graphical procedures for the deter- 
mining of the acceptability of asample. In Fig- 
ure 1 the cumulative number of disagreements 
between two scorings are plottedfor each trial. 
When the line of plots of cumulative disagree- 
ments in scoring crosses the rejection line, then 
the sample of papers shouldbe rejected as poorly 
scored. If the line of plots of the number of agree- 
ments crosses the acceptanceline, we accept the 
sample as correctly scored. If the line of plots 
does not cross the acceptance line or rejection 
line, one must keep on testing and plotting until 
all the papers in the sample are checked. 

If the Average Sample Number function of this 
test procedure is computed it will be seen that if 





a large Sample of papers has 96 percent of its 
papers with correct scores, on the average only 
140 papers will need to be re-scored in order to 
accept the sample as correctly scored. 


Summary 


A procedure has been presented for providing 
economical checks of the accuracy of IBM answer 
sheet scoring. This procedure was devised to 
routinely check samples from large groups of 
fairly long tests that had been administered and 
scored in field units. The procedure consists in 
constructing and maintaining a control chart for 
all errors in scoring greater than one point. If 
the sample indicates that the percent of acceptable 
papers in the group is 96 or greater, then the 
group is declared acceptable. 

Other situations that require more (or less) 
rigorous standards can easily construct their own 
control charts by consulting Wald (2). 
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A SUPPLEMENTARY NOTE ON “MAIN EF- 
FECTS AND NON-ZERO INTERACTIONS 


IN A TWO-WAY CLASSIFICATION” 


RAYMOND O. COLLIER, Jr. 
University of Minnesota 


THE AUTHOR wishes to thank Joe H. Ward 
and Robert A. Bottenberg for pointing up an er- 
ror in his paper [1] and alsofor motivating him 
to look more basically at the problem with which 
that paper was concerned. It was maintained 
there in equation (15) that, under the hypothesis 
Hp: aj = 0, with side conditions ¢ B= wiv 


the residual sum of squares was 
(1) SS' (E)=SS(E). 


Upon rechecking the minimization process it was 
found that the correct result should have been 


(2) SS' (E)=SS(E)+SS(a) . 


Thus the test of Hp: aj = 0, as referred to in{1], 
along with the above side conditions does proceed 
as usual and is made by means of 


(3) p- S8(a)/Q-1) 
~ § 8 (E)/Id(K-1 


The problem can be viewed more basically, 
however, and the question asked, ‘‘Referring to 
[1], exactly what is being tested in Ho: aj =0?’’ 
For an answer it is possible to utilize the con- 
cept of estimability. 

Consider the following model: 


(4) Xijk=M+ Aj + Bj + 7Fij + Cijx, 
where », a, f, & are fixed effects, the €jjk are 
random effects as defined in [1] and i = 1,2,..., 
I; j=1,2,...,J; k=1,2,...,K. The expected 
value of Xijk is given by 

E (Xijk) = m+ ai + Bj + 4ij = $i; « 


Now (4) can be written in matrix form as 
X= YB+E where X is (IJK) x1, Y is (IJK) x 





(IJ +1+J+1), Bis (IJ+1+J+1)x landE is (IJK) x 1. 
It can be shown that the rank of Y is IJ, which im- 
plies that just IJ linear functions of the parameters 
yu, a, 8,7, inBareestimable. (Alinear func- 
tion of parameters is said to be estimable if it is 
possible to construct a linear function of the obser- 
vations, an estimate, which is unbiased. ) 


Three theorems regarding estimability are pro- 
vided by Kempthorne [ 2:78] which will aid in this 
problem. They are (a) any linear function of the 
parameters in a linear model is estimable if it is 
a linear function of the expectations of the observa- 
tions; (b) any reparametrization leads to the same 
estimate of an estimable function; and (c) it is pos- 
sible to test hypotheses only about estimable func- 
tions. 

Utilizing (a), it is obvious that é jj is estimable 
and so therefore any linear function of € jj is al- 
so estimable. However, nolinear function of the 
aj in (4) is estimable unless restrictions are im- 
posed on the parameters, i.e., they are reparam- 
etrized. It is clear, then, from (c) that the hy- 
pothesis Hp: aj = a in (4) isimpossible of being 
tested. 

Jf one labels the reparametrized a j of [1] as 
a; and notes thatthe estimate of aj was given as 
a? = Xj../JK - X.../]JK, it is possible to invoke 
(b) above and ascertain just what is being estimat- 
ed by &j . It is seen that the expected value of 
af isaj+ Fi./y-a./y- %./yy and that more- 
over, the hypothesis, Ho: aj = 0 from [1] is in 
reality the hypothesis Ho: a;* = 0, equivalent 
to Ho: j + Fj-/y=a-/y +9../ty ° 

This last hypothesis is exactly the hypothesis, 
Hy :€ ;./y=§, where € = €../,j so that we have 
Ho :64-/3 = §--/1y- 


Summary 


The point to the above remarks is as follows: 
In a two-way classification, assuming the interac- 
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tion model of (4) a test of the main effect hypothe- 
sis Hp: aj = a is impossible. That which can be 
tested is Ho :€j ./y=§ .. Ay which is identical to test- 
ing Ho: &j +7i./J = a constant = a./y-7%../fJ 
and is made by means of (3) as usual. The test of 
the hypothesis in the reparametrized model is ex- 
pressly a test of the two immediately preceding hy- 
potheses. 
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Announcing May publication of 


MODERN STATISTICAL METHODS: 
DESCRIPTIVE AND INDUCTIVE 


by PALMER O. JOHNSON This book is written for students 
University of Minnesota who wish a functional knowledge of 
and ROBERT W. B. JACKSON Statistics to help them adequately 
University of Toronto solve problems needing statistical 

analysis. 


The text not only relates how the 
various statistical techniques work, 
but also why the techniques are used 
and how their properties are derived. 
When interpreting statistical methods, 
the book continually stresses ap- 
proaching them by intelligent, scien- 
tific analyses. After stating a statistical 
method, the text always points out 
why such methods are advantageous. 
The basic theory of each method is 
first stated and then the various ways 
in which it may be used are given. 
Realistic, practical problems are given 
as illustrations of the theory in use. 


Publication—May 1959 - 525 pages 
6X9inches Probable price $7.50 list. 
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HANDBOOK OF PRIVATE SCHOOLS 


40th ogee 1959 
cloth, $10.00 


Fortieth anniversary edition of the traditional complete. reference on 
private schools. This book iricludes both primary and secondary 
schools, special schools, boarding and day schools in all parts of the 
United States, Canada and around the world. All materials have been 
revised, with many new schools added, and new sections.on Canadian 
schools and guidance clinics are included. As usual, a new and pro- 
vocative introduction contains articles by well-known figures in pri- 
vate education. 


GUIDE TO JUNIOR COLLEGES 
AND 


SPECIALIZED SCHOOLS AND COLLEGES 


3rd edition, 1959 
cloth, $5.00 


Companion volume to the HANDBOOK, this guide provides current 
information on two-year liberal arts programs and a variety of voca- 
tional and specialized programs. The only complete guide to post- 
secondary education, it is a vital reference volume for advisors, 
educators and parents. Featured sections include: 

Statistics and descriptive text 

Junior colleges with secondary rita ones 

Associations membership lists 


Index, including senior colleges and universities 


Order both books from PORTER SARGENT PUBLISHERS, ll 
Beacon Street, Boston, Massachusetts, publishers of THE DIREC- 
TORY FOR EXCEPTIONAL CHILDREN and THE GUIDE TO 
SUMMER CAMPS AND SUMMER SCHOOLS. 





